Unsolved
This post is more than 5 years old
73 Posts
0
6581
July 17th, 2013 12:00
Looking for some NDMP help. NetBackup error 99.
Hi all,
I am stumped and looking for some help or advice.
I configured ndmp backups on my vnx7500 datamover2 and datamover3.
Datamover2 to has no issues, but datamover3 started failing resently with a netbackup generic error 99.
Below is the server log from datamover3
2013-07-17 13:28:35: DRIVERS: 6: FCDMTL 3 [16.0.3] Status buffer overrun: xid 1e2, msg 119570e0, address id 20000
2013-07-17 13:28:36: DRIVERS: 6: FCDMTL 3 [16.0.3] Status buffer overrun: xid 1f6, msg 119575e0, address id 20000
2013-07-17 13:28:36: CAM: 3: I/O Error: c48t0l2: CamStatus 0x84 ScsiStatus 0x02 Sense 0x05/0x20/0x00
2013-07-17 13:28:36: CAM: 3: camFlags 0x41 Addr 0x0x000f49da08 Len 0xf8 Resid 0x0
2013-07-17 13:28:36: CAM: 3: cdb: a2 60 00 10 00 00 00 00 00 f8 00 00
| SCSI ERR ON #4802 camstat 0x84 bstat InvalidOperation
| scsi stat.skey.ascq 0x02.05.2000 Check condition
| skey [ ] Illegal request
|____________________ ascq Invalid command operation code
2013-07-17 13:28:37: DRIVERS: 6: FCDMTL 3 [16.0.3] Status buffer overrun: xid 1fc, msg 11957760, address id 20000
2013-07-17 13:28:39: NDMP: 6: Active NDMP backup/restore streams: 1, system configured concurrent streams: 4, maximum concurrent sessions supported: 8.
2013-07-17 13:28:39: NDMP: 4: Session 014 (thread ndmp014) < Backup type: dump >
2013-07-17 13:28:39: NDMP: 4: Session 014 (thread ndmp014) Name: TYPE Value: dump
2013-07-17 13:28:39: NDMP: 4: Session 014 (thread ndmp014) Name: FILESYSTEM Value: /root_vdm_2/symlive
2013-07-17 13:28:39: NDMP: 4: Session 014 (thread ndmp014) Name: PREFIX Value: /root_vdm_2/symlive
2013-07-17 13:28:39: NDMP: 4: Session 014 (thread ndmp014) Name: LEVEL Value: 1
2013-07-17 13:28:39: NDMP: 4: Session 014 (thread ndmp014) Name: HIST Value: y
2013-07-17 13:28:39: NDMP: 4: Session 014 (thread ndmp014) Name: UPDATE Value: y
2013-07-17 13:28:39: NDMP: 4: Session 014 (thread ndmp014) tape drive: c48t0l2, CALLER_ADDRESS 0x00014be50b.
2013-07-17 13:28:39: NDMP: 4: Thread ndmp014 uses a set of Pax threads with handle 0.
2013-07-17 13:28:39: NDMP: 6: Thread ndmp014 filterDialect could not be set - prereqs not met [pax_cmd_thread::setFilterDialect]
2013-07-17 13:28:40: NDMP: 6: Session 014 (thread nasa00) backup dir /root_vdm_2/symlive is not on read-only checkpoint.
2013-07-17 13:28:40: PAX: 6: Thread nasa00 dedup backup ratio threshold of /root_vdm_2/symlive is 90%
2013-07-17 13:30:36: NDMP: 6: Session 014 (thread ndmp014) < receive DATA_ABORT request >
2013-07-17 13:30:36: NDMP: 4: Got stat buffer with data pointer=NULL at 1576, abort=1
2013-07-17 13:30:36: NDMP: 4: Thread bkup009 Backup was aborted
2013-07-17 13:30:36: NDMP: 4: < LOG type: 3, msg_id: 0, entry: Backup was aborted, hasAssociatedMsg: 0, associatedMsgSeq: 0 >
2013-07-17 13:30:36: NDMP: 4: Thread bkup011 Abort is called during readdir (dp=0x05731a3e08) for NDMP DUMP format file history
2013-07-17 13:30:36: NDMP: 4: Thread bkup057 Abort is called during readdir (dp=0x004a1bf208) for NDMP DUMP format file history
2013-07-17 13:30:36: NDMP: 4: Thread nasa00 Backup root directory: /root_vdm_2/symlive
2013-07-17 13:30:36: NDMP: 4: Thread nasa00 write count 13847691264
2013-07-17 13:30:36: NDMP: 4: Thread nasa00 time used 116 sec
2013-07-17 13:30:36: NDMP: 4: Thread nasa00 write rate 116023 KB/Sec
2013-07-17 13:30:36: NDMP: 4: Thread nasa00 read rate 0 KB/Sec
2013-07-17 13:30:36: NDMP: 4: Thread nasa00 average file size: 97KB
2013-07-17 13:30:36: NDMP: 4: Thread nasa00 dir or 0 size file processed: 60362
2013-07-17 13:30:36: NDMP: 4: Thread nasa00 1B -- 8KB size file processed: 57024
2013-07-17 13:30:36: NDMP: 4: Thread nasa00 8KB -- 16KB size file processed: 3328
2013-07-17 13:30:36: NDMP: 4: Thread nasa00 16KB -- 32KB size file processed: 4245
2013-07-17 13:30:36: NDMP: 4: Thread nasa00 32KB -- 64KB size file processed: 3262
2013-07-17 13:30:36: NDMP: 4: Thread nasa00 64KB -- 1MB size file processed: 9733
2013-07-17 13:30:36: NDMP: 4: Thread nasa00 1MB -- 32MB size file processed: 1094
2013-07-17 13:30:36: NDMP: 4: Thread nasa00 32MB -- 1GB size file processed: 33
2013-07-17 13:30:36: NDMP: 4: Thread nasa00 1G more size file processed: 0
2013-07-17 13:30:36: NDMP: 4: Thread nasa00 Dedup files processed: 1
2013-07-17 13:30:36: NDMP: 6: Thread nasa00 server_archive: emctar vol 1, 139081 files, 0 bytes read, 13847691264 bytes written
2013-07-17 13:30:36: NDMP: 4: Got stat buffer with data pointer=NULL at 1576, abort=1
2013-07-17 13:30:36: NDMP: 4: Thread nasa00 cifs_ea new count: 1810975, delete count: 1810975, freeListCount: 142
2013-07-17 13:30:36: NDMP: 3: Thread nasa00 Backup is aborted.
2013-07-17 13:30:36: NDMP: 3: < LOG type: 2, msg_id: 0, entry: Backup is aborted., hasAssociatedMsg: 0, associatedMsgSeq: 0 >
2013-07-17 13:30:36: PAX: 4: Thread nasa00 used these threads for this job: bkup000 bkup001 bkup002 bkup004 bkup054 bkup007 bkup012 bkup006 bkup010 bkup005 bkup017 bkup009 bkup011 bkup057 bkup015 bkup019 bkup016 bkup014 bkup060 bkup026 bkup021 bkup058 bkup023 bkup056 bkup055 bkup024 bkup059 bkup061 bkup063 bkup003 bkup008 bkup013 bkup018 bkup020 bkup022 bkup025 bkup062 bkup027 bkup028 bkup029 bkup030 bkup031 bkup032 bkup033 bkup034 bkup035 bkup036 bkup037 bkup038 bkup039 bkup040 bkup041 bkup042 bkup043 bkup044 bkup045 bkup046 bkup047 bkup048 bkup049 bkup050 bkup051 bkup052 bkup053
2013-07-17 13:30:36: NDMP: 6: Session 014 (thread ndmp014) < sent DATA_ABORT reply, error: NDMP_NO_ERR >
2013-07-17 13:30:37: NDMP: 6: Cleanup: Active NDMP backup/restore sessions back to: 0, system configured concurrent streams: 4, maximum concurrent sessions allowed: 8.
2013-07-17 13:34:00: SMB: 6:[odcvnx2107dm3_vdm] >DC=ADCORPDC1(10.97.108.57) R=11 T=1 ms S=0,0x1/-1
2013-07-17 13:34:52: CFS: 4: 10:[odcvnx2107dm3_vdm] The Windows event log Security of the vdm odcvnx2107dm3_vdm is full.
2013-07-17 13:44:53: CFS: 4: 10:[odcvnx2107dm3_vdm] The Windows event log Security of the vdm odcvnx2107dm3_vdm is full.
2013-07-17 13:54:54: CFS: 4: 10:[odcvnx2107dm3_vdm] The Windows event log Security of the vdm odcvnx2107dm3_vdm is full.
I am looking for more details on what is causing the
2013-07-17 13:30:36: NDMP: 3: Thread nasa00 Backup is aborted.
2013-07-17 13:30:36: NDMP: 3: < LOG type: 2, msg_id: 0, entry: Backup is aborted., hasAssociatedMsg: 0, associatedMsgSeq: 0 >
Thank you in advance for any help or direction to resolving the issue.
0 events found


christopher_ime
6 Operator
•
2K Posts
0
July 17th, 2013 21:00
Please consider moving this question as-is (no need to recreate) to the proper forum for maximum visibility. Questions written to the users' own "Discussions" space don't get the same amount of attention and questions can go unanswered for a long time.
You can do so by selecting "Move" under ACTIONS along the upper-right. Then search for and select: "VNX Support Forum" which would be the most relevant for this question.
In the meantime, can you see if the following KB article is related:
emc300279: "NDMP Backups Failing in NetBackup"
Are you also getting the following error in the NetBackup logs?
ERR - Cannot write TIR data to media, NDMP return code 20.
Suggests an issue with the choice of link aggregation. Does this pertain to your active data mover that it is failing on?
tim.koopman
73 Posts
0
July 18th, 2013 08:00
Hi Christopher,
I checked out emc300279. I do not think it applies, but I am going to look into it further. I will let you know what I find.
Tim
tim.koopman
73 Posts
0
July 18th, 2013 09:00
Hi Karl,
Thank you for the advice. I do have an EMC support case open, but support gets busy and I wanted to see what input the user community might have.
All I have so far is it looks like ndmp is aborting, and I can not figure out exactly why.
Tim
umichklewis
4 Apprentice
•
1.2K Posts
0
July 18th, 2013 09:00
If you haven't done so, you'll want to search the Symantec forums for the instructions to elevate the logging level in Netbackup. Open a case with EMC Support, and be sure to tell them which version of NBU you're running. Support can provide you with a set of commands (.server_config) to elevate the NDMP logging level on the datamover. You can then grab the consolidated NBU logfile and compare it to the NDMP logs from the datamover, which will probably shed light on the problem.
Let us know if this helps!
christopher_ime
6 Operator
•
2K Posts
0
July 19th, 2013 07:00
In the meantime, while you wait on support (as Mark stated, they will likely ask you to increase the debugging log level), I will make one more comment that I don't think it is able to create a SnapSure checkpoint first prior to taking the (NDMP) backup. This is suggested by the error message, and I believe the filesystem has reached the fs_dedupe (default) 90% savvol_threshold. Do you also see events within Unisphere that deduplication for that filesystem has aborted as the SavVol usage capacity had reached at least 90% of the filesystem's SavVol High Water Mark itself (which is also by default 90%). This doesn't in itself suggest that snapshots can not be created but limited by the information you provided, it is possible.
I am assuming of course that you had even set the option to create and backup a SnapSure checkpoint as part of the backup policy for that filesystem. Generally speaking assuming it is supported by the Data Management Application (need to check the "NAS NDMP Interoperability Matrix", "Configuring NDMP Backups on VNX", and cross-reference also with Symantec support documents), the backup policy is created with the following setting: (SET) SNAPSURE=Y, or only if unsupported by the DMA, then can be set globally on the data mover via the following parameter: NDMP.snapsure
Can you verify that you did set your environment to take checkpoints first as part of the NDMP backup? If so, then can you run the following commands:
1) fs_ckpt -list -all
2) fs_dedupe -info
Finally, can you try to manually create a checkpoint to see if this may be the culprit?
tim.koopman
73 Posts
0
July 19th, 2013 09:00
Hi Christopher,
I did not specifically setup check points, yet.
As you can see the out put from the fs_ckpt does not show any checkpoints.
$ fs_ckpt symlive -list -all
id ckpt_name creation_time inuse fullmark total_savvol_used ckpt_usage_on_savvol
Info 26306752329: The value of ckpt_usage_on_savvol for read-only checkpoints may not be consistent with the total_savvol_used.
id wckpt_name inuse fullmark total_savvol_used base ckpt_usage_on_savvol
$
$ fs_dedupe -info symlive
Id = 34
Name = symlive
Deduplication = On
Status = Idle
As of the last file system scan (Fri Jul 12 18:16:46 CDT 2013):
Files scanned = 2472278
Files deduped = 220134 (9% of total files)
File system capacity = 1613399 MB
Original data size = 1462223 MB (91% of current file system capacity)
Space saved = 229193 MB (16% of original data size)
File system parameters:
Case Sensitive = no
Duplicate Detection Method = sha1
Access Time = 15
Modification Time = 15
Minimum Size = 24 KB
Maximum Size = 8388608 MB
File Extension Exclude List =
Minimum Scan Interval = 7
SavVol Threshold = 90
Backup Data Threshold = 90
Cifs Compression Enabled = yes
Pathname Exclude List =
I was able to manually create a check point.
$ fs_ckpt symlive -Create
operation in progress (not interruptible)...id = 34
name = symlive
acl = 0
in_use = True
type = uxfs
worm = off
volume = v189
pool = Gold File Pool 45
member_of = root_avm_fs_group_42
rw_servers= server_3
ro_servers=
rw_vdms = odcvnx2107dm3_vdm
ro_vdms =
auto_ext = no,thin=no
deduplication = On
thin_storage = False
tiering_policy = Auto-Tier/Optimize Pool
compressed= False
mirrored = False
ckpts = symlive_ckpt1
stor_devs = APM00114602107-11B2,APM00114602107-11B3,APM00114602107-11B4,APM00114602107-11B5,APM00114602107-11B6
disks = d19,d32,d20,d33,d21
disk=d19 stor_dev=APM00114602107-11B2 addr=c0t2l6 server=server_3
disk=d19 stor_dev=APM00114602107-11B2 addr=c16t2l6 server=server_3
disk=d32 stor_dev=APM00114602107-11B3 addr=c16t2l7 server=server_3
disk=d32 stor_dev=APM00114602107-11B3 addr=c0t2l7 server=server_3
disk=d20 stor_dev=APM00114602107-11B4 addr=c0t2l8 server=server_3
disk=d20 stor_dev=APM00114602107-11B4 addr=c16t2l8 server=server_3
disk=d33 stor_dev=APM00114602107-11B5 addr=c16t2l9 server=server_3
disk=d33 stor_dev=APM00114602107-11B5 addr=c0t2l9 server=server_3
disk=d21 stor_dev=APM00114602107-11B6 addr=c0t2l10 server=server_3
disk=d21 stor_dev=APM00114602107-11B6 addr=c16t2l10 server=server_3
id = 36
name = symlive_ckpt1
acl = 0
in_use = True
type = ckpt
worm = off
volume = vp193
pool = Gold File Pool 45
member_of =
rw_servers=
ro_servers= server_3
rw_vdms =
ro_vdms = odcvnx2107dm3_vdm
checkpt_of= symlive Fri Jul 19 10:26:09 CDT 2013
deduplication = Off
thin_storage = False
tiering_policy = Auto-Tier/Optimize Pool
compressed= False
mirrored = False
used = 1%
full(mark)= 90%
stor_devs = APM00114602107-119E,APM00114602107-119F,APM00114602107-11A0,APM00114602107-11A1,APM00114602107-11A2
disks = d9,d22,d10,d23,d11
disk=d9 stor_dev=APM00114602107-119E addr=c0t1l2 server=server_3
disk=d9 stor_dev=APM00114602107-119E addr=c16t1l2 server=server_3
disk=d22 stor_dev=APM00114602107-119F addr=c16t1l3 server=server_3
disk=d22 stor_dev=APM00114602107-119F addr=c0t1l3 server=server_3
disk=d10 stor_dev=APM00114602107-11A0 addr=c0t1l4 server=server_3
disk=d10 stor_dev=APM00114602107-11A0 addr=c16t1l4 server=server_3
disk=d23 stor_dev=APM00114602107-11A1 addr=c16t1l5 server=server_3
disk=d23 stor_dev=APM00114602107-11A1 addr=c0t1l5 server=server_3
disk=d11 stor_dev=APM00114602107-11A2 addr=c0t1l6 server=server_3
disk=d11 stor_dev=APM00114602107-11A2 addr=c16t1l6 server=server_3
$ fs_ckpt symlive -list -all
id ckpt_name creation_time inuse fullmark total_savvol_used ckpt_usage_on_savvol
36 symlive_ckpt1 07/19/2013-10:26:09-CDT y 90% 1% 1%
Info 26306752329: The value of ckpt_usage_on_savvol for read-only checkpoints may not be consistent with the total_savvol_used.
id wckpt_name inuse fullmark total_savvol_used base ckpt_usage_on_savvol
$
Tim
tim.koopman
73 Posts
0
July 19th, 2013 09:00
Hi Karl,
I did get the commands to enable debug on the datamover from support. My assumption might be wrong, but since ndmp backups work just fine on datamover 2 and do not work on datamover 3. I have been focused on the datamover side and have not dug into the netbackup logs. That might be the next step based on what support has to say about the datamover logs with debug turned on.
Tim
christopher_ime
6 Operator
•
2K Posts
0
July 19th, 2013 09:00
Thanks for the info. Ignore me, I misinterpreted the logs.
tim.koopman
73 Posts
0
July 19th, 2013 11:00
Christopher,
No worries. I appreciate all the in put I can get at this point.
tim.koopman
73 Posts
0
July 19th, 2013 14:00
Hi Karl,
The passwords should be the same, but I am not using Global NDMP passwords. Only one of my netbackup media servers has the ndmp license.
The below test works from my media server with out any issue so I know it is not a password issue.
odcmedia1:/tmp # /usr/openv/volmgr/bin/tpautoconf -verify ndmp-vnx2107-dm3
Connecting to host "ndmp-vnx2107-dm3" as user "ndmp"...
Waiting for connect notification message...
Opening session--attempting with NDMP protocol version 4...
Opening session--successful with NDMP protocol version 4
host supports TEXT authentication
host supports MD5 authentication
Getting MD5 challenge from host...
Logging in using MD5 method...
Host info is:
host name "server_3"
os type "DartOS"
os version "EMC Celerra File Server.T.7.0.54.3"
host id "abc1997"
Login was successful
Host supports LOCAL backup/restore
Host supports 3-way backup/restore
Tim
umichklewis
4 Apprentice
•
1.2K Posts
0
July 19th, 2013 14:00
Sounds good to me. Out of curiosity, do you have different NDMP passwords on DM2 and DM3? Or are you using a Global NDMP password in Netbackup?
umichklewis
4 Apprentice
•
1.2K Posts
0
July 19th, 2013 14:00
Yup - the verify eliminates the password issue. So much for my shot in the dark!
tim.koopman
73 Posts
0
July 19th, 2013 15:00
Hi Karl,
No problem. I am tired of reloading, I have taken so many shots in the dark on this problem.
EMC support is escalating to a senior engineer, and I just sent off some very large logs to Symantec Support.
Tim
tim.koopman
73 Posts
0
July 23rd, 2013 06:00
All,
The below Symantec technote resloved my ndmp backup issues.
http://www.symantec.com/business/support/index?page=content&id=TECH192541
Note the technote does not reference VNX but changing my ndmp backup type from dump to tar
worked around the issue. I was also getting the exact errors in my netbackup debug log as referenced in TECH192541.
Note: The version of NetBackup I am running is 7.5.0.4
I would like to extend a thank you to everyone who commented on my post.
-Tim
The below is a cut and paste out of TECH192541.
Solution
Review the NetBackup Hardware Compatibility list and confirm that the filer and backup method are supported.
For this specific filer and NetBackup version, NDMP backups of TYPE=dump are not supported due to the unexpected values in the file history. Change the NDMP backup to TYPE=tar to perform a successful backup.
Support for this new type of file history is currently planned for NetBackup 7.6
http://www.symantec.com/business/support/index?page=content&id=TECH192541
tim.koopman
73 Posts
0
July 23rd, 2013 11:00
All,
The work around I provided might not be as good as I thought.
See the limitation of using ndmp with type=tar.
See technote with the below title.
Why it is not recommended to use “set type=tar” in an NDMP backup policy protecting an EMC networked storage device
http://www.symantec.com/business/support/index?page=content&id=TECH202412