Unsolved
This post is more than 5 years old
19 Posts
0
7941
October 15th, 2010 04:00
Problem with Celera NDMP over IP backup.
Hi All,
Hope I can get a help here. I have a problem last 3 days with NDMP backup one of 2 directory.
BackUp software Symantec BackUpExec 2010, last OK backup 10 OCT 2010. All backUp later finished with error
Backup set #60 on storage media #20
NDMP Log Message: Medium error
NDMP Log Message: Write failed on archive volume 1
NDMP Log Message: server_archive: emctar vol 1, 1097662 files, 0 bytes read, 1344562659328 bytes written
NDMP Data Halted: Internal Error
NDMP Log Message: Mover: Data connection socket read error.
NDMP Mover Halted: Internal Error
================ NDMP ENVIRONMENT ================
FILESYSTEM=/Dept
LEVEL=0
DUMP_DATE=-1
UPDATE=Y
HIST=Y
SNAPSURE=Y
TYPE=dump
=============== NDMP ENVIRONMENT END ===============
I have secondary File System /Users in this Job with the same param and this FS backuped ok without any errors.


Andrey_ALexeev
19 Posts
0
October 15th, 2010 04:00
It is so huge
2010-10-15 06:41:26: NDMP: 4: Session 152 (thread ndmp152) tape service is on a remote side, CALLER_ADDRESS 0x11ef61f.
2010-10-15 06:41:26: NDMP: 4: Thread ndmp152 uses a set of Pax threads with handle 0.
2010-10-15 06:41:26: NDMP: 6: Thread ndmp152 filterDialect could not be set - prereqs not met [pax_cmd_thread::setFilterDialect]
2010-10-15 06:41:26: NDMP: 4: Thread ndmp152 fsSize of /automaticNDMPCkpts/automaticTempNDMPCkpt66-43-1287110480 is 2205971881984 (../pax_ndmp.cxx: 1184)
2010-10-15 06:41:26: ADMIN: 6: Command succeeded : serverMount ckpt ro /automaticNDMPCkpts/automaticTempNDMPCkpt66-43-1287110480 289=174 ro
2010-10-15 06:41:27: PAX: 6: Thread nasa00 dedup backup ratio threshold of /Dept is 90%
2010-10-15 06:41:27: DEDUPE: 6: Domain initialized for fsid 174
2010-10-15 07:12:30: LGDB: 6: Restricted Groups enforced for msk-fps63
2010-10-15 08:47:31: LGDB: 6: last message repeated 1 times
2010-10-15 09:27:28: NDMP: 3: Session 152 (thread nasw00) remoteWriteMsg network timeout! timeout =30
2010-10-15 09:27:28: NDMP: 3: Session 152 (thread nasw00) NdmpdSession::ndmpdApiWrite fails in local or remote wirte msg, moverAddressType=1, mp=0x262a9a8
2010-10-15 09:27:28: NDMP: 3: Thread bkup028 Medium error
2010-10-15 09:27:28: NDMP: 3: < LOG type: 2, msg_id: 0, entry: Medium error, hasAssociatedMsg: 0, associatedMsgSeq: 0 >
2010-10-15 09:27:28: NDMP: 4: Thread bkup028 Write failed on archive volume 1
2010-10-15 09:27:28: NDMP: 4: < LOG type: 3, msg_id: 0, entry: Write failed on archive volume 1, hasAssociatedMsg: 0, associatedMsgSeq: 0 >
2010-10-15 09:27:28: NDMP: 4: Thread nasa00 Backup root directory: /Dept
2010-10-15 09:27:28: NDMP: 4: Thread nasa00 write count 193113817088
2010-10-15 09:27:28: NDMP: 4: Thread nasa00 time used 9963 sec
2010-10-15 09:27:28: NDMP: 4: Thread nasa00 write rate 18927 KB/Sec
2010-10-15 09:27:28: NDMP: 4: Thread nasa00 read rate 0 KB/Sec
2010-10-15 09:27:28: NDMP: 4: Thread nasa00 average file size: 2290KB
2010-10-15 09:27:28: NDMP: 4: Thread nasa00 dir or 0 size file processed: 9583
2010-10-15 09:27:28: NDMP: 4: Thread nasa00 1B -- 8KB size file processed: 5462
2010-10-15 09:27:28: NDMP: 4: Thread nasa00 8KB -- 16KB size file processed: 2010
2010-10-15 09:27:28: NDMP: 4: Thread nasa00 16KB -- 32KB size file processed: 7837
2010-10-15 09:27:28: NDMP: 4: Thread nasa00 32KB -- 64KB size file processed: 9515
2010-10-15 09:27:28: NDMP: 4: Thread nasa00 64KB -- 1MB size file processed: 30032
2010-10-15 09:27:28: NDMP: 4: Thread nasa00 1MB -- 32MB size file processed: 16760
2010-10-15 09:27:28: NDMP: 4: Thread nasa00 32MB -- 1GB size file processed: 1145
2010-10-15 09:27:28: NDMP: 4: Thread nasa00 1G more size file processed: 9
2010-10-15 09:27:28: NDMP: 4: Thread nasa00 Dedup files processed: 25531
2010-10-15 09:27:28: NDMP: 6: Thread nasa00 server_archive: emctar vol 1, 82353 files, 0 bytes read, 193113817088 bytes written
2010-10-15 09:27:28: NDMP: 4: Got stat buffer with data pointer=NULL at 1557, abort=1
2010-10-15 09:27:28: NDMP: 4: Thread nasa00 cifs_ea new count: 31000998, delete count: 31000998, freeListCount: 129
Rainer_EMC
4 Operator
•
8.6K Posts
0
October 15th, 2010 04:00
I would suggest to look at the data mover log via server_log to see if there are any errors there at the time the backup fails
Rainer
Andrey_ALexeev
19 Posts
0
October 15th, 2010 06:00
We have a problem with dedube on this FS, problem with SaVolCapacity. Maybe it is also impct on BackUp?
bergec
275 Posts
0
October 15th, 2010 10:00
Have you checker for bad tapes (media errors) ?
Claude
umichklewis_ac7b91
300 Posts
0
October 15th, 2010 13:00
I'll also chime in - on a busy system (control station too busy, slow SATA disks, etc.), you may not be able to create checkpoints before the timeout expires. This can also cause the backup to fail.
Karl
Rainer_EMC
4 Operator
•
8.6K Posts
0
October 15th, 2010 13:00
Yes, if you are using NDMP with integrated checkpoints than a savvol at capacity could cause the checkpoint to go inactive and the backup to fail
Rainer
Andrey_ALexeev
19 Posts
0
October 18th, 2010 00:00
Understand. What is recommendation?
Also. I have another problem last weekend. At last Friday I just create snapshot on problem FS and remove it from main backup. During weekend all my backup jobs finished ok and when NS120 was idle new NDMP job with problem FS was started. Job finished OK, but I have one notification something about - Backup finished ok but system can`t cound and enumerate files thus you can see zero in job size. Something like this. When I opened jobs I saw that all files which name in cyrilic chars is unrecognized, it seems like in diferent code page! I never saw this before.
Rainer_EMC
4 Operator
•
8.6K Posts
0
October 18th, 2010 00:00
I would suggest to open a service request and let customer service investigate
Rainer
Andrey_ALexeev
19 Posts
0
October 18th, 2010 01:00
Case was opened last Friday before lunch, still have not help ))
I have only EMC document described my problem with deduplication 210954, I think problem with deduplication abd backup it is a consequences reason is SavVol prblem.
DanJost
190 Posts
0
October 18th, 2010 08:00
Was this set in the initial release or has this changed? Also, wouldn't this still contribute to savvol issues by using all of the available space and the next "snapshot" (be it replication, integrated snapsure backup, etc) could potentially need more space to be allocated?
Rainer_EMC
4 Operator
•
8.6K Posts
0
October 18th, 2010 08:00
Just to be clear: Celerra deduplication is specifically built to not cause a savvol extension – it rather pauses when it sees the savvol at a configurable full HWM.
Any storage allocation is might do in the savvol is very much the same is if you were changing or compressing data from the client side and will be freed once the checkpoints needing it expire.
The design idea was that data gets compressed as it ages – so after the initial data is compressed it only has to process an incremental amount of data.
Rainer
dynamox
9 Legend
•
20.4K Posts
0
October 18th, 2010 08:00
so what causes my SavVol to balloon the minute i enable dedupe ?
DanJost
190 Posts
0
October 18th, 2010 08:00
I would have to concur with Rainer - if your savvol can't "extend" (or isn't configured to do so) and you are using integrated snapsure in the backup, you'll have the potential for problems and the NDMP could be a symptom of this. I would work through that first to see if it fixes your problem. Are all of your snapshots going belly up? If your dedupe process never finishes due to savvol space, every time it kicks back in it may be filling it back up and you have all sorts of side-bar issues. When the dedupe runs (in the older versions of dedupe anyway) and compresses/dedupes out a file, the change is captured in the snapshots. Others have grumbled about this "feature" extending the savvol and the deduped file savings are simply being transferred into a bloated savvol size.
Dan
Rainer_EMC
4 Operator
•
8.6K Posts
0
October 18th, 2010 10:00
Deduplicating a file actually writes it as a new compressed file and deletes the old file (replaced by a stub) – just the same as if you would do it from a client
That alone doesn’t do anything with the savvol
Of course when writing the new file there is a chance that blocks are re-used from recently deleted files that are still needed by active checkpoints
Or blocks from the just deleted uncompressed file are re-used by a client activity or the deduplication
In these cases the old block contents need to be saved in the savvol
If we wouldn’t do that checkpoints wouldn’t work correctly
Just image that you refresh a checkpoint, dedupe a file, take another checkpoint
You would want to be able to restore both “versions” through their respective checkpoints – right ?
I think the resulting change in the directory most probably get the block for that into the savvol
I guess you could say that (as with other client activity and checkpoints) the “chances” or re-using a block that is still needed is higher the fuller the file system is
Rainer_EMC
4 Operator
•
8.6K Posts
1
October 18th, 2010 10:00
If you are concerned about savvol usage you can just change the HWM where deduplications stops
That will just make the dedupe for the initial bulk of the data over a longer period of time (you could help that by increasing the options for which age of files are processed)
For new file systems my personal suggestion is to first complete the data migration and deduplication before turning on checkpoints
And do a reasonable savvol sizing – there is no free lunch – if you want to keep x days at y% daily change rate then you need x times y% space