NDMP fails for netapps backup

Question

Hi, We have a netapps FAS960 filer having a NDMP connection to our IBM3584 jukebox. Problem is that sometimes backup fails with the following messages: 03/27/06 14:30:02 nsrd: ndmp save notice: netapps:/vol/vorhees NDMP save failed on 'networker.datastorage.bii' 03/27/06 14:30:03 savegrp: command 'nsrndmp_save -T dump -s networker.datastorage.bii -c netapps -g demo -l full -LL -q -W 78 -N /vol/vorhees /vol/vorhees ' for client netapps exited with return code 1. * netapps:/vol/vorhees 1 retry attempted * netapps:/vol/vorhees nsrndmp_save: Performing backup to NDMP type of device * netapps:/vol/vorhees nsrndmp_save: Opened the tape device : nrst1a * netapps:/vol/vorhees nsrndmp_save: Performing DAR Backup.. * netapps:/vol/vorhees nsrndmp_save: netapps:/vol/vorhees NDMP save running on 'networker.datastorage.bii' * netapps:/vol/vorhees nsrndmp_2fh: Reading file /nsr/tmp/FileIndex1344766155/fhfile * netapps:/vol/vorhees nsrndmp_save: NDMP Service Log: DUMP: creating '/vol/vorhees/../snapshot_for_backup.22' snapshot. * netapps:/vol/vorhees * netapps:/vol/vorhees nsrndmp_save: NDMP Service Log: DUMP: Using Full Volume Dump * netapps:/vol/vorhees * netapps:/vol/vorhees nsrndmp_save: NDMP Service Log: DUMP: Date of this level 0 dump: Mon Mar 27 14:11:06 2006. * netapps:/vol/vorhees * netapps:/vol/vorhees nsrndmp_save: NDMP Service Log: DUMP: Date of last level 0 dump: the epoch. * netapps:/vol/vorhees * netapps:/vol/vorhees nsrndmp_save: NDMP Service Log: DUMP: Dumping /vol/vorhees to NDMP connection * netapps:/vol/vorhees * netapps:/vol/vorhees nsrndmp_save: NDMP Service Log: DUMP: mapping (Pass I)[regular files] * netapps:/vol/vorhees * netapps:/vol/vorhees nsrndmp_save: NDMP Service Log: DUMP: mapping (Pass II)[directories] * netapps:/vol/vorhees * netapps:/vol/vorhees nsrndmp_save: NDMP Service Log: DUMP: estimated 884272977 KB. * netapps:/vol/vorhees * netapps:/vol/vorhees nsrndmp_save: NDMP Service Log: DUMP: dumping (Pass III) [directories] * netapps:/vol/vorhees * netapps:/vol/vorhees nsrndmp_save: NDMP Service Log: DUMP: Mon Mar 27 14:21:02 2006 : We have written 482790 KB. * netapps:/vol/vorhees * netapps:/vol/vorhees nsrndmp_save: NDMP Service Log: DUMP: dumping (Pass IV) [regular files] * netapps:/vol/vorhees * netapps:/vol/vorhees nsrndmp_save: NDMP Service Log: DUMP: Mon Mar 27 14:26:02 2006 : We have written 13787336 KB. * netapps:/vol/vorhees * netapps:/vol/vorhees nsrndmp_save: NDMP Service Log: DUMP: Mon Mar 27 14:31:02 2006 : We have written 26337497 KB. * netapps:/vol/vorhees * netapps:/vol/vorhees nsrndmp_save: NDMP Service Log: DUMP: Tape write failed. * netapps:/vol/vorhees * netapps:/vol/vorhees nsrndmp_save: NDMP Service Log: DUMP: DUMP IS ABORTED * netapps:/vol/vorhees * netapps:/vol/vorhees nsrndmp_save: NDMP Service Log: DUMP: Deleting '/vol/vorhees/../snapshot_for_backup.22' snapshot. * netapps:/vol/vorhees * netapps:/vol/vorhees nsrndmp_save: Data server halted: Failed to connect to the mover. * netapps:/vol/vorhees nsrndmp_save: NDMP Service Log: Connection or IO Error. * netapps:/vol/vorhees * netapps:/vol/vorhees nsrndmp_save: Tape server halted: Error during the backup. * netapps:/vol/vorhees nsrndmp_save: NDMP Service Log: MoveletOutput: Internal Error. * netapps:/vol/vorhees netapps: /vol/vorhees level=full, 37 GB 00:23:59 868609 files * netapps:/vol/vorhees nsrndmp_save: Save session closed with NW server successfully * netapps:/vol/vorhees * netapps:/vol/vorhees nsrndmp_2fh: Received the ABORT message * netapps:/vol/vorhees nsrndmp_2fh: interrupted, exiting the File History merging... * netapps:/vol/vorhees nsrndmp_save: backup failed. * netapps:/vol/vorhees nsrndmp_2fh: nsrndmp_2fh aborted. However, sometimes backup do complete successfully. Therefore, I don't think there is any problem with the device mapping between netapps and the tape library. On the netapps side, We always have this 'scsi error reservation conflict' message coming out. I wonder whether does this has any relation to backup failing. Thanks

ylam1 · Answer

These tape drives are shared among other storage nodes and NDMP devices. We are using dynamic drive sharing licenses. These tapes are also zoned to other storage nodes and other NDMP devices.

ble1 · Answer

Have you checked KB at now.netapp.com for this? I can see several outputs on the subject, but you will know better if one of them matches your case. Perhaps opening a case with NetApp would be good idea. Do you see any error message related to drives at the same time on the other nodes sharing the same drive?

ylam1 · Answer

Think I have checked the KB at now.netapp.com but I don't remember having any good feedback. Will check again. The other storage nodes which are sharing the same tape drives are backing up fine but they are not NDMP devices, just storage nodes.

ble1 · Answer

It might have. The reason for the failure is 'DUMP: Tape write failed' and you will probably see some more error in daemon.log. I assume these tapes are dedicated to filer. Are they also zoned to filer only?

ble1 · Answer

That's fine, but do you see any error/warning messages on them for the same time when you get the error on the NetApp side? Sometimes SAN events for shared objects will cause same error to be logged on all machines using this shared object. The message here may be different than the one from NetApp and may additionally help you troubleshoot this.

As you probably know, SCSI error reservation conflict implies that some other host has the device reserved. This brings the question on how do you set reservation abong your hosts. (if you are suing 7.3 then even NW might be involved as reservation has been added as an attribute for device in that release).

oiturbe · Answer

Sorry I forgot to askThis is a new install?Was running before?Did you have multiples instances of the same NDMP client?

oiturbe · Answer

which version of Networker you have?Did you have an output of the nsrndmp_save command?

DavidHampson · Answer

Can you let us know what version of Networker you are using? Versions before 7.2.1 (if I remember rightly) must dedicate drives to NDMP; you cannot share them with regular backups.

On saying that I've never tested this situation so I am not too sure what would happen if it was configured so!

ylam1 · Answer

Hi,I am using networker 7.2.1 . We have other NDMP devices : EMC CNS and they share their drives with other storage nodes and they are backing up fine. It is just with this netapps filer.However, we have set up a test environment connecting the netapps to another tape library and install networker 7.3. We use dynamic drive sharing on devices in the tape lib and it is backing up fine for a week. So I really suspect it has something to do with networker version. But, the fact that CNS is backing up fine on 7.2.1 leaves much to ponder about.

ble1 · Answer

7.0.4 is the latest if you follow the date of the release (addresses some serious bugs within DAR implementation), but 7.1 is the latest feature code. Have in mind that latest officially supported version by EMC is (or at least it was few weeks ago) 7.0.2. I had a case with 7.0.3 and case got dropped because 7.0.3 was not on compatibility list. Even these are only patch releases they didn't want to hear it - currently their support queue is rather full so they are trying to clean it up as possible - if something is not on compatibility list it gets denied support. I do remember that 7.1 had RFE created to be supported.

ylam1 · Answer

Hi, We have upgraded the OnTap version to 7.1 recently hoping that it will solve the problem but still persist. I belive 7.1 is the latest version.

guenterH1 · Answer

Hi,please could you tell about the version of Ontap used.There is an issue with tape drive handling in version 7.0.1 and 7.0.2. Behavior was changed between version 6.5 and 7.0 making handling of tape errors less flexible which causes backup abort.The robust handling used in 6.5 was re-implemented in 7.0.3 version of Ontap.Please contact NetApp support for details about this.Regards

guenterH1 · Answer

Yes I can confirm that support is denied if product in use is not listed in the compat guide.
Recently I also had a case with Ontap 7.1 being closed due to not listed.
Further there should be an RFE getting Ontap 7.1 supported release. I think this is addressed in LGTpa85701 but I'm not sure.

tpoyraz · Answer

Hi There, How did you manage to solve the problem, if you did; will appreciate a reply since we are experiencing exactly the same problem with Networker v7.2.2 ...Thanks & Regards,

jhope-bailie · Answer

It looks like the problem originates at the tape drive which fails after about 26 GB. You could verify this by running a normal dump directly on the NetApp. This would write the backup to the tape without any of the NDMP functionality being used. If this also fails, get the tape drive serviced or renew your media or both.

If not, you will need to look elsewhere. I think the other errors in your log are a consequence of the tape failure.

SCSI reservation is used by hosts to "lock" or "reserve" a specific device for the use of a specific host. This makes sense on a SAN when multiple hosts may have access to devices and you need to prevent clashes. Normally once a host has finished with a device, it releases the reservation. Sometimes a failure can cause the session to finish without the lock being cleared.

Are there multiple hosts making use of the drive(s) in the library ? i.e. is it shared ? In that case, you may need to manage the access to avoid conflicts.

If not, rebooting the library should clear the reservation.
Cheers,
John Hope-Bailie

NetWorker

Was this post helpful?