I'm having a problem with one of our instances of Networker 8.x running on Windows 2008 R2. Specifically Networker is having some difficulty labeling spare tapes in our jukebox. This issue has exhausted our pool of spares and backups are currently non-functional. Naturally, I am quite concerned about getting this fixed and running normally again.
Looking at the management console I can see all the tapes listed under the devices view. For each tape it shows the barcode, however the volume field shows all tapes the spares as unlabeled. Oddly though, under the tape volumes view I am seeing something completely different. There, each tape looks to have been only (partially) labeled. I can see the pool designation, volume name and each tape shows it's appendable with 0KB with 0% usage. Only some of the tapes are showing a corresponding barcode. Most are missing. We use the barcode as the tape label.
When networker tries to relabel these, it fails with the following error message:
"Duplicate volume name (%barcode). Select a new volume, or remove the original volume."
Also, in the tape devices view I am seeing what looks to be corrupted volume and barcode labels. Several (6) of my already full tapes are showing "E" as both the barcode and volume name. This is a bit alarming.
I found the following community document to resolve this kind of an issue: https://community.emc.com/docs/DOC-20084
Given the age of this document (2012), would still be safe to follow for my version of networker?
The NW behavior in this area has not changed at all over all these years.
In general, NW has no problems with duplicate labels - they are absolute possible:
- NW will accept them ifyou import a duplicate volume fromanother data zone (NW server) thanks to the usage of the internal volume ID
- However, NW will not allow to use the same volume name for labeling in the same data zone.
In your case, I think there is an issue with the media db. Info has been lost.
Of course, you may
- delete a volume from the media db (this will not touch the data at all)
- then scan the tape to get the media db repopulated with that tape's metadata.
Depending on your number of tapes this may take 'ages'.
But before you do that, may I suggest that you inventory the jukebox media with the option "Force load and verify labels" and try to re-sync the volume information this way.
Hi Thanks for the response.
OK so, I tried inventorying one of the partially labeled tapes tape using the "Force load and verify labels" option and received the following error:
"Moving (%pool) tape mtio failed, I/O error"
"No tape label found"
Interestingly, when I run an "NSRJB" on the command line, it shows there is a tape in the slot and it's barcode, but doesn't show the label or the pool assignment (this data is however displayed in the "tape volumes" view). Also, and I just noticed this....most of the tapes new tapes with this problem are not properly showing the location in the "tape devices" view. It's as if networker doesn't recognize they're in the jukebox.
There is an awful lot of inconsistency here.
Hard to say where to go next. I most likely would use this sequence.
- try to reset the jukebox (nsrjb -HEv)
- restart the NW/the NW server
- de-install/re-install the jukebox and do a full inverntory. Especially if it is small.
How many tapes are involved?
"No tape label found". This either means that the tape is damaged, or there is a problem reading the tape.
When you just run nsrjb, it only displays what NetWorker thinks is the jukebox contents. This is based on information that is stored in the res database when NetWorker jukebox resource was last updated. This is why its output may not match the current jukebox status.
it shows there is a tape in the slot and it's barcode, but doesn't show the label or the pool assignment: This implies that NetWorker does not currently know what is the actual tape volume in that is in that slot.
Try to determine when this all started. Render the server's daemon.raw and look to see when these issues started. This will also give you an idea on what tapes in which slots are affected. You can then remove them from the jukebox, and replace them with new or recyclable tapes so that backups can continue.
If you need to determine which tapes in the jukebox are having load issues again, please run from command line:
To speed up the inventory process by using all the drives, in NMC go into the jukebox properties, and update the jukebox parallelism to equal the number of physical drives.
Of course it is still possible, that one drive in your jukebox does not work well or is defective.
If possible ...
- install NW on another server (Windows or UNIX/Linux - it does not matter)
- where you have NW installed (it will work for 30 days in eval mode)
- along with a compatible tape drive
This way you will avoid a potential jukebox problem.
Then insert the tape and run "scanner -m" to read the label.
Thanks for all the advice guys.
I also have a ticket open with EMC and based on their advice did the follow:
Disabled a drive and manually loaded at tape (I have 5 drives in my Qualstar)
scanner -vvv \\ .\Tape0
The command did not find a label or any records on the 20 or so tapes which appear partially labeled.
One tape however, did have a proper label and contained some records.
Then I ran mminfo -avot -q volid=volume_id -r volume,barcode
This returned nothing for any of the tapes I looked at (include the one with a proper label and some records).
Kind of looking like I will need to reset the jukebox. Given there are 104 slots, I'm guessing this might take a while to inventory everything. Probably manually clean the tape drives as well....
NOTE: You should load this jukebox with new tapes, and with new bar code labels, so that your daily backups can continue while you investigate this problem.
Symptoms to look for:
Even though the NetWorker media database does not have info for a volume, its history would still be in the daemon.raw(s). Pick a volume that mminfo has no info on, but you are sure it was used, then search for that volume in the daemon.raw. Render the log file first so that it is readable. eg...
In #4, you want to change the log file to the ones that is in the logs directory, and also create unique output files for each raw file.
As i suggested earlier use the following to inventory the jukebox and get an output so that it can tell you which slots had problems loading tape.
Hi Wallace, I tried doing the reset as instructed, but this did not resolve the duplicate name issue. It also only used one of the 5 drives, even though parallelism was set to 4,. I tried setting it to 5, but that did not make any difference.
This exercise did however correct the volume names for the tapes which showed up with a Volume name of "E", but the barcode listed in the media view still shows "E".
I'll run this again piped to text files as you suggest and look for which drives are having issues. I'm not sure about drive ordering issues as this system has been running fine for the last several years.
Based on advice from EMC, I checked each problem tape using the scanner -vvv \\.\Tape0 command and it showed no valid label or records. I then ran mminfo -avot -q volid=volume_id -r volume, barcode for these same tapes and it returned no information. I also ran an nsrck -m which did not return any output.
So I deleted these tapes out of the database using nsrmm -d <volume> , followed by nsrim -X. I then manually labeled one tape, which succeeded, so I left it alone to see if it would label the rest without trouble. Well, it labeled a few of them, but failed to properly label the rest. So, we're kinda back where I began.
The system has not had it's bootstrap restored and I didn't see any dirty shutdowns in the logs. However, we think around the time the problems started (March - based on the logs), the system may have suffered a power failure and this might have corrupted things.
I have not tried deleting out the jukebox and re-adding it. Perhaps that would help. However, I would want to document the configuration prior to doing that.
At this point I plan to pull out these problem tapes to get things moving again. I did extract the logs and looked them over a bit, but most of the messages don't mean much to me. These log files are pretty large (500MB)..which makes them a bit tough to work with. Can you recommend a good way to rotate these?