Start a Conversation

Unsolved

This post is more than 5 years old

4682

June 19th, 2018 16:00

Duplicate Volume Names

Hello,

I'm having a problem with one of our instances of Networker 8.x running on Windows 2008 R2. Specifically Networker is having some difficulty labeling spare tapes in our jukebox. This issue has exhausted our pool of spares and backups are currently non-functional. Naturally, I am quite concerned about getting this fixed and running normally again.

Looking at the management console I can see all the tapes listed under the devices view. For each tape it shows the barcode, however the volume field shows all tapes the spares as unlabeled. Oddly though, under the tape volumes view I am seeing something completely different. There, each tape looks to have been only (partially) labeled. I can see the pool designation, volume name and each tape shows it's appendable with 0KB with 0% usage. Only some of the tapes are showing a corresponding barcode. Most are missing. We use the barcode as the tape label.

When networker tries to relabel these, it fails with the following error message:

"Duplicate volume name (%barcode). Select a new volume, or remove the original volume."

Also, in the tape devices view I am seeing what looks to be corrupted volume and barcode labels. Several (6) of my already full tapes are showing "E" as both the barcode and volume name. This is a bit alarming.

I found the following community document to resolve this kind of an issue: https://community.emc.com/docs/DOC-20084

Given the age of this document (2012), would still be safe to follow for my version of networker?

Thanks,

Michael

2.4K Posts

June 20th, 2018 10:00

The NW behavior in this area has not changed at all over all these years.

In general, NW has no problems with duplicate labels - they are absolute possible:

  -  NW will accept them ifyou import a duplicate volume fromanother data zone (NW server) thanks to the usage of the internal volume ID

  -  However, NW will not allow to use the same volume name for labeling in the same data zone.

In your case, I think there is an issue with the media db. Info has been lost.

Of course, you may

  -  delete a volume from the media db (this will not touch the data at all)

  -  then scan the tape to get the media db repopulated with that tape's metadata.

Depending on your number of tapes this may take 'ages'.

But before you do that, may I suggest that you inventory the jukebox media with the option "Force load and verify labels" and try to re-sync the volume information this way.

263 Posts

June 20th, 2018 11:00

To prevent possible data loss:

  • never relabel a volume unless you are absolutely sure of what is on that volume
  • never delete the volume from the media database, unless you are absolutely sure of what is on that volume

Getting a duplicate volume message implies that the volume information that is stored on the volume itself is not the same information that the NetWorker media database has.  Information such as the volume id.  This then implies that there are two different volumes with the same volume name and/or bar-code, but not the same information as what we have recorded in the media database.

Similarly, there are many people called John Smith, but each of them would have a unique government id number.

When you load a volume in to a drive and tell NetWorker to mount, NetWorker reads the volume label, and compares the info in the Media database.  If the info matches, then the mount completes and is successful.  If it does not match.. then you get the duplicate volume.

So the problem you face is trying to determine what is the status of these volumes.  Are they actually the same volume?  i.e. Is there really only one volume with name "x" and you can safely delete and relabel? *If you are sure* that the answer is yes, then  just delete the volume from the NetWorker media database, and then label.

Otherwise, if you are not sure, then you will need to catalog what is on each affected volume... and review what is saved on the volume itself before deciding if you can relabel.

1 Rookie

 • 

8 Posts

June 20th, 2018 13:00

Hi Thanks for the response.

OK so, I tried inventorying one of the partially labeled tapes tape using the "Force load and verify labels" option and received the following error:

"Moving (%pool) tape mtio failed, I/O error"

"No tape label found"

Interestingly, when I run an "NSRJB" on the command line, it shows there is a tape in the slot and it's barcode, but doesn't show the label or the pool assignment (this data is however displayed in the "tape volumes" view). Also, and I just noticed this....most of the tapes new tapes with this problem are not properly showing the location in the "tape devices" view. It's as if networker doesn't recognize they're in the jukebox.

There is an awful lot of inconsistency here.

Michael

2.4K Posts

June 21st, 2018 04:00

Hard to say where to go next. I most likely would use this sequence.

  -  try to reset the jukebox (nsrjb -HEv)

  -  restart the NW/the NW server

  -  de-install/re-install the jukebox and do a full inverntory. Especially if it is small.

How many tapes are involved?

263 Posts

June 21st, 2018 04:00

"No tape label found".  This either means that the tape is damaged, or there is a problem reading the tape.

  • The scanner command will tell you if it still finds data on the tape.  If there is, then you could still recover what is on that tape.
  • assuming that it is a drive issue, try using a different tape drive.  Clean the drive could help too.
  • If you have more than one drive, is there any evidence of drive ordering issues?  eg: do you see any messages such as "serial number mismatch" ?  Did you have this problem in the past?  If so, then this could have caused your tape issue.

When you just run nsrjb, it only displays what NetWorker thinks is the jukebox contents.  This is based on information that is stored in the res database when NetWorker jukebox resource was last updated.  This is why its output may not match the current jukebox status.

it shows there is a tape in the slot and it's barcode, but doesn't show the label or the pool assignment: This implies  that NetWorker does not currently know what is the actual tape volume in that is in that slot.

Try to determine when this all started.  Render the server's daemon.raw and look to see when these issues started.  This will also give you an idea on what tapes in which slots are affected.  You can then remove them from the jukebox, and replace them with new or recyclable tapes so that backups can continue.

If you need to determine which tapes in the jukebox are having load issues again,  please run from command line:


  script /tmp/inventory.txt

  nsrjb -HEv

  nsrjb -Ipv

  nsrjb -v

  exit

To speed up the inventory process by using all the drives, in NMC go into the jukebox properties, and update the jukebox parallelism to equal the number of physical drives.

2.4K Posts

June 21st, 2018 19:00

Of course it is still possible, that one drive in your jukebox does not work well or is defective.

If possible ...

  -  install NW on another server (Windows or UNIX/Linux - it does not matter)

  -  where you have NW installed (it will work for 30 days in eval mode)

  -  along with a compatible tape drive

This way you will avoid a potential jukebox problem.

Then insert the tape and run "scanner -m" to read the label.

1 Rookie

 • 

8 Posts

June 22nd, 2018 11:00

Thanks for all the advice guys.

I also have a ticket open with EMC and based on their advice did the follow:

Disabled a drive and manually loaded at tape (I have 5 drives in my Qualstar)

scanner -vvv \\ .\Tape0

The command did not find a label or any records on the 20 or so tapes which appear partially labeled.

One tape however, did have a proper label and contained some records.

Then I ran mminfo -avot -q volid=volume_id -r volume,barcode

This returned nothing for any of the tapes I looked at (include the one with a proper label and some records).

Kind of looking like I will need to reset the jukebox. Given there are 104 slots, I'm guessing this might take a while to inventory everything. Probably manually clean the tape drives as well....

263 Posts

June 25th, 2018 07:00

NOTE: You should load this jukebox with new tapes, and with new bar code labels, so that your daily backups can continue while you investigate this problem.

Symptoms to look for:

  • was the NetWorker bootstrap recovered recently?
  • in the daemon.raw, are there any error messages that are generated by nrmmdbd?  If so, then when did these messages start?

Even though the NetWorker media database does not have info for a volume, its history would still be in the daemon.raw(s).  Pick a volume that mminfo has no info on, but you are sure it was used, then search for that volume in the daemon.raw.  Render the log file first so that it is readable.  eg...


  1. open command line in admin mode
  2. cd \(%nsr directory%)\logs
  3. dir daemon*.raw
  4. nsr_render_log daemon.raw > out1.txt 2>&1
  5. type out1.txt | find  /i  "(volume name)"

In #4, you want to change the log file to the ones that is in the logs directory, and also create unique output files for each raw file.

As i suggested earlier use the following to inventory the jukebox and get an output so that it can tell you which slots had problems loading tape.

  1. To speed up the inventory process by using all the available drives, in NMC go into the jukebox properties, and update the jukebox parallelism to equal the number of physical drives.
  2. nsrjb -HEv >1reset.txt 2>&1
  3. nsrjb -Ipv >2slow-inventory.txt 2>&1
  4. nsrjb -v >3jukebox-list.txt
  5. Review the output files, and note the slots and bar codes having load problems.

1 Rookie

 • 

8 Posts

June 25th, 2018 16:00

Hi Wallace, I tried doing the reset as instructed, but this did not resolve the duplicate name issue. It also only used one of the 5 drives, even though parallelism was set to 4,. I tried setting it to 5, but that did not make any difference.

This exercise did however correct the volume names for the tapes which showed up with a Volume name of "E", but the barcode listed in the media view still shows "E".

I'll run this again piped to text files as you suggest and look for which drives are having issues. I'm not sure about drive ordering issues as this system has been running fine for the last several years.

Based on advice from EMC, I checked each problem tape using the scanner -vvv \\.\Tape0 command and it showed no valid label or records. I then ran mminfo -avot -q volid=volume_id -r volume, barcode for these same tapes and it returned no information. I also ran an nsrck -m which did not return any output.

So I deleted these tapes out of the database using nsrmm -d , followed by nsrim -X. I then manually labeled one tape, which succeeded, so I left it alone to see if it would label the rest without trouble. Well, it labeled a few of them, but failed to properly label the rest. So, we're kinda back where I began.

The system has not had it's bootstrap restored and I didn't see any dirty shutdowns in the logs. However, we think around the time the problems started (March - based on the logs), the system may have suffered a power failure and this might have corrupted things.

I have not tried deleting out the jukebox and re-adding it. Perhaps that would help. However, I would want to document the configuration prior to doing that.

At this point I plan to pull out these problem tapes to get things moving again. I did extract the logs and looked them over a bit, but most of the messages don't mean much to me. These log files are pretty large (500MB)..which makes them a bit tough to work with. Can you recommend a good way to rotate these?

Michael

1 Rookie

 • 

8 Posts

June 26th, 2018 15:00

This issue "might" be solved. Looking like a bad drive. I'll leave it disabled and monitor things for a while.

Thanks for all the help guys.

Michael

2.4K Posts

June 26th, 2018 16:00

You must setup the appropriate resource on the NW server.

Here is an example for NW 9.1.x/Windows:

C:\>

C:\>

C:\>nsradmin -p nsrexec

NetWorker administration program.

Use the "help" command for help, "visual" for full-screen mode.

nsradmin>

nsradmin> . type: nsr log; name: daemon.raw

Current query set

nsradmin> p

                        type: NSR log;

               administrator: "group=Administrators,host=localhost",

"group=Administrators,host=#############################",

                              "isroot,host=##################################";

                       owner: NetWorker;

             maximum size MB: 88;

            maximum versions: 10;

        runtime rendered log: ;

    runtime rollover by size: Enabled;

    runtime rollover by time: ;

                        name: daemon.raw;

                    log path: "D:\\nsr\\logs\\daemon.raw";

nsradmin>

nsradmin> update maximum size MB: 89

             maximum size MB: 89;

Update? p

Unrecognized answer ; assuming `no'.

nsradmin> update maximum size MB: 89

             maximum size MB: 89;

Update? y

updated resource id 10.0.128.36.0.0.0.0.173.87.12.78.0.0.0.0.##.##.##.##(6)

nsradmin>

nsradmin> q

C:\>

Do not forget to restart ALL NW services/daemons in the end.

263 Posts

June 26th, 2018 20:00

The jukebox reset was to eject all the tape drives and scan the slots to see if there was a tape in any slot.  This is similar to performing a power up initialization, where the jukebox scans to see what it has before becoming ready for use.

The jukebox parallelism sets how many physical tape drives would be used for a label or inventory operation.  So I am surprised the inventory command (nsrjb -Ipv) only used 1 drive if the value was set to 4.

>  showed up with a Volume name of "E"

I do not know what this means....can you give an example of one volume showing this?

Drive ordering issues could be one reason why you are having missing tape labels.  You could have had it sometime in the past, might not have realized it.  One of the symptoms of this problems is when NetWorker reports "serial number mismatch".  You can try searching for this phrase in the log files.

nsrck -m usually does not return any output.

If scanner -v reported a no data on tape error, then I would say it was a new blank tape...  or a damaged tape if you are sure it was already used.

mminfo -avot -q volid=volume_id -r volume, barcode...   no information just means that that volume ID is not currently in the media database.  Curious.  How did you know what volume ID to query for?  Do you really mean to say volume ID  or volume name?

>  ... but failed to properly label the rest.

I would look at the output to see why the label failed.

An abnormal shutdown could have caused the NetWorker databases to be corrupted.  If that happened to the media database, then that could have cause loss of backup, volume, and/or client information.

Definitely document how the jukebox currently looks like before you delete the jukebox and drives and then scan and configure.  You should also stop nw and make an online backup copy of nsr\res too.

@bingo explained how to configure NetWorker to set the log file max size before NetWorker rolls over to a new one.

What is the service request # for this issue?

No Events found!

Top