Problem with Celera NDMP over IP backup.

Question

Hi All,

Hope I can get a help here. I have a problem last 3 days with NDMP backup one of 2 directory.

BackUp software Symantec BackUpExec 2010, last OK backup 10 OCT 2010. All backUp later finished with error

Backup set #60 on storage media #20

NDMP Log Message: Medium error

NDMP Log Message: Write failed on archive volume 1

NDMP Log Message: server_archive: emctar vol 1, 1097662 files, 0 bytes read, 1344562659328 bytes written

NDMP Data Halted: Internal Error

NDMP Log Message: Mover: Data connection socket read error.

NDMP Mover Halted: Internal Error

================ NDMP ENVIRONMENT ================

FILESYSTEM=/Dept

LEVEL=0

DUMP_DATE=-1

UPDATE=Y

HIST=Y

SNAPSURE=Y

TYPE=dump

=============== NDMP ENVIRONMENT END ===============

I have secondary File System /Users in this Job with the same param and this FS backuped ok without any errors.

Andrey_ALexeev · Answer

It is so huge

2010-10-15 06:41:26: NDMP: 4: Session 152 (thread ndmp152) tape service is on a remote side, CALLER_ADDRESS 0x11ef61f.
2010-10-15 06:41:26: NDMP: 4: Thread ndmp152 uses a set of Pax threads with handle 0.
2010-10-15 06:41:26: NDMP: 6: Thread ndmp152 filterDialect could not be set - prereqs not met [pax_cmd_thread::setFilterDialect]
2010-10-15 06:41:26: NDMP: 4: Thread ndmp152 fsSize of /automaticNDMPCkpts/automaticTempNDMPCkpt66-43-1287110480 is 2205971881984 (../pax_ndmp.cxx: 1184)
2010-10-15 06:41:26: ADMIN: 6: Command succeeded : serverMount ckpt ro /automaticNDMPCkpts/automaticTempNDMPCkpt66-43-1287110480 289=174 ro
2010-10-15 06:41:27: PAX: 6: Thread nasa00 dedup backup ratio threshold of /Dept is 90%
2010-10-15 06:41:27: DEDUPE: 6: Domain initialized for fsid 174
2010-10-15 07:12:30: LGDB: 6: Restricted Groups enforced for msk-fps63
2010-10-15 08:47:31: LGDB: 6: last message repeated 1 times
2010-10-15 09:27:28: NDMP: 3: Session 152 (thread nasw00) remoteWriteMsg network timeout! timeout =30
2010-10-15 09:27:28: NDMP: 3: Session 152 (thread nasw00) NdmpdSession::ndmpdApiWrite fails in local or remote wirte msg, moverAddressType=1, mp=0x262a9a8
2010-10-15 09:27:28: NDMP: 3: Thread bkup028 Medium error
2010-10-15 09:27:28: NDMP: 3: < LOG type: 2, msg_id: 0, entry: Medium error, hasAssociatedMsg: 0, associatedMsgSeq: 0 >
2010-10-15 09:27:28: NDMP: 4: Thread bkup028 Write failed on archive volume 1
2010-10-15 09:27:28: NDMP: 4: < LOG type: 3, msg_id: 0, entry: Write failed on archive volume 1, hasAssociatedMsg: 0, associatedMsgSeq: 0 >
2010-10-15 09:27:28: NDMP: 4: Thread nasa00 Backup root directory: /Dept
2010-10-15 09:27:28: NDMP: 4: Thread nasa00 write count 193113817088
2010-10-15 09:27:28: NDMP: 4: Thread nasa00 time used 9963 sec
2010-10-15 09:27:28: NDMP: 4: Thread nasa00 write rate 18927 KB/Sec
2010-10-15 09:27:28: NDMP: 4: Thread nasa00 read rate 0 KB/Sec
2010-10-15 09:27:28: NDMP: 4: Thread nasa00 average file size: 2290KB
2010-10-15 09:27:28: NDMP: 4: Thread nasa00 dir or   0 size file processed: 9583
2010-10-15 09:27:28: NDMP: 4: Thread nasa00   1B -- 8KB size file processed: 5462
2010-10-15 09:27:28: NDMP: 4: Thread nasa00 8KB -- 16KB size file processed: 2010
2010-10-15 09:27:28: NDMP: 4: Thread nasa00 16KB -- 32KB size file processed: 7837
2010-10-15 09:27:28: NDMP: 4: Thread nasa00 32KB -- 64KB size file processed: 9515
2010-10-15 09:27:28: NDMP: 4: Thread nasa00 64KB -- 1MB size file processed: 30032
2010-10-15 09:27:28: NDMP: 4: Thread nasa00 1MB -- 32MB size file processed: 16760
2010-10-15 09:27:28: NDMP: 4: Thread nasa00 32MB -- 1GB size file processed: 1145
2010-10-15 09:27:28: NDMP: 4: Thread nasa00    1G more   size file processed: 9
2010-10-15 09:27:28: NDMP: 4: Thread nasa00 Dedup files processed: 25531
2010-10-15 09:27:28: NDMP: 6: Thread nasa00 server_archive: emctar vol 1, 82353 files, 0 bytes read, 193113817088 bytes written

2010-10-15 09:27:28: NDMP: 4: Got stat buffer with data pointer=NULL at 1557, abort=1
2010-10-15 09:27:28: NDMP: 4: Thread nasa00 cifs_ea new count: 31000998, delete count: 31000998, freeListCount: 129

Rainer_EMC · Answer

I would suggest to look at the data mover log via server_log to see if there are any errors there at the time the backup fails Rainer

Andrey_ALexeev · Answer

We have a problem with dedube on this FS, problem with SaVolCapacity. Maybe it is also impct on BackUp?

bergec · Answer

Have you checker for bad tapes (media errors) ? Claude

umichklewis_ac7b91 · Answer

I'll also chime in - on a busy system (control station too busy, slow SATA disks, etc.), you may not be able to create checkpoints before the timeout expires. This can also cause the backup to fail.

Karl

Rainer_EMC · Answer

Yes, if you are using NDMP with integrated checkpoints than a savvol at capacity could cause the checkpoint to go inactive and the backup to fail Rainer

Andrey_ALexeev · Answer

Understand. What is recommendation?

Also. I have another problem last weekend. At last Friday I just create snapshot on problem FS and remove it from main backup. During weekend all my backup jobs finished ok and when NS120 was idle new NDMP job with problem FS was started. Job finished OK, but I have one notification something about - Backup finished ok but system can`t cound and enumerate files thus you can see zero in job size. Something like this. When I opened jobs I saw that all files which name in cyrilic chars is unrecognized, it seems like in diferent code page! I never saw this before.

Rainer_EMC · Answer

I would suggest to open a service request and let customer service investigate Rainer

Andrey_ALexeev · Answer

Case was opened last Friday before lunch, still have not help ))

I have only EMC document described my problem with deduplication 210954, I think problem with deduplication abd backup it is a consequences reason is SavVol prblem.

DanJost · Answer

Was this set in the initial release or has this changed?  Also, wouldn't this still contribute to savvol issues by using all of the available space and the next 'snapshot' (be it replication, integrated snapsure backup, etc) could potentially need more space to be allocated?

Rainer_EMC · Answer

Just to be clear: Celerra deduplication is specifically built to not cause a savvol extension – it rather pauses when it sees the savvol at a configurable full HWM.

Any storage allocation is might do in the savvol is very much the same is if you were changing or compressing data from the client side and will be freed once the checkpoints needing it expire.

The design idea was that data gets compressed as it ages – so after the initial data is compressed it only has to process an incremental amount of data.

Rainer

dynamox · Answer

so what causes my SavVol to balloon the minute i enable dedupe ?

DanJost · Answer

I would have to concur with Rainer - if your savvol can't "extend" (or isn't configured to do so) and you are using integrated snapsure in the backup, you'll have the potential for problems and the NDMP could be a symptom of this. I would work through that first to see if it fixes your problem. Are all of your snapshots going belly up? If your dedupe process never finishes due to savvol space, every time it kicks back in it may be filling it back up and you have all sorts of side-bar issues. When the dedupe runs (in the older versions of dedupe anyway) and compresses/dedupes out a file, the change is captured in the snapshots. Others have grumbled about this "feature" extending the savvol and the deduped file savings are simply being transferred into a bloated savvol size.

Dan

Rainer_EMC · Answer

Deduplicating a file actually writes it as a new compressed file and deletes the old file (replaced by a stub) – just the same as if you would do it from a client

That alone doesn’t do anything with the savvol

Of course when writing the new file there is a chance that blocks are re-used from recently deleted files that are still needed by active checkpoints

Or blocks from the just deleted uncompressed file are re-used by a client activity or the deduplication

In these cases the old block contents need to be saved in the savvol

If we wouldn’t do that checkpoints wouldn’t work correctly

Just image that you refresh a checkpoint, dedupe a file, take another checkpoint

You would want to be able to restore both “versions” through their respective checkpoints – right ?

I think the resulting change in the directory most probably get the block for that into the savvol

I guess you could say that (as with other client activity and checkpoints) the “chances” or re-using a block that is still needed is higher the fuller the file system is

Rainer_EMC · Answer

If you are concerned about savvol usage you can just change the HWM where deduplications stops

That will just make the dedupe for the initial bulk of the data over a longer period of time (you could help that by increasing the options for which age of files are processed)

For new file systems my personal suggestion is to first complete the data migration and deduplication before turning on checkpoints

And do a reasonable savvol sizing – there is no free lunch – if you want to keep x days at y% daily change rate then you need x times y% space

Celerra

Problem with Celera NDMP over IP backup.

Was this post helpful?