Troubleshooting Tape Library Configuration Problems in NetWorker
Summary: This article is intended to help Supporters and Customers determine the causes for a detected robot's inability to successfully configure.
Symptoms
If library configuration worked without issue previously and suddenly encounters an issue, consider possible changes that may be impeding detection and configuration:
- Upgrade of firmware or driver of either robot or tape devices
- Addition, replacement, or removal of tape hardware or other library components
- Change of NetWorker software version or Operating System patches
- Any change to storage transport between host and robot
If the library has never worked - confirm that the hardware is supported in the NetWorker Hardware Compatibility Guide. Remember that it is possible for a library to be partially functional; discovery alone does not guarantee usability or supportability.
- Failures using NetWorker Management Console to detect and configure jukebox
- Failures using jbconfig to detect and configure jukebox
- Failures using jbedit to modify a jukebox configure
- Detection and configuration of a NetWorker tape library consists of two user phases:
- Device detection, property enumeration, and creation of 'Unconfigured' attributes
- Creation and association of NSR Jukebox and NSR Device tape drive objects
- Problems configuring a tape library which does not stem from detection or access issues typically point to an inconsistency in the detected resources of either the library or the drives:
- Drive serial numbers (as detected in drives or cached in robot)
- Conflicting devices already configured with the same driver handles
- Specific, internal SCSI command response problems
- Inconsistent robot information and physical reality
- Automatic configuration with jbconfig is limited to the local host that the command is run on, and still requires serial number detection and filehandle matching
- jbconfig (Option 4) is a manual way to attempt overriding autodetection where these features are not supported, or having problems
- jbedit is a command-line tool that can be used to edit existing library configurations
Cause
Well-known causes of Library Configuration issues:
- How to remove Unconfigured Devices (Orange Wrenches) (A Dell Support account is required to view this article)
Consider the possible elements or factors that could affect NetWorker's ability to configure a tape library:
- Inability to detect and properly access robot or tape resources
- Robot drivers, firmware, or problems leading to inconsistent internal robotics information
- Robot features such as partitioning, which may confuse resource availability or identification
- Dynamic World Wide Naming, which deliberately masks drive WWNs and SNs
- Conflicting, pre-existing NetWorker configuration database resources
- Code defects after changing software versions
Resolution
In order to troubleshoot library configuration issues, after considering the last known changes (if any), proceed to troubleshoot by devolving the process to its primitive constituents and testing them individually.
All the required data is currently collected by NSRGet when run with the -o:d switch. NetWorker: How to Use the NSRGet NetWorker Data Collection Tool
Library Configuration: Preparation
- Naming Persistence: In order to ensure that the library configuration remains valid, the hosts accessing drives must ensure that device names are persistently bound and unchanging - this prevents the possibility of future drive ordering problems (see Troubleshooting Tape Library Drive Ordering Problems in NetWorker)
- For Windows, see: Implementing Tape Device Name Persistence for Windows (A Dell Support account is required to view this article)
- For Linux, see: Implementing Tape Device Name Persistence for Linux (A Dell Support account is required to view this article)
- Device Resource Cleanup: In the Devices section, Ensure any standalone tape devices which will be configured as library drives are deleted
- Scan for Devices: In the Devices section, right-click the Storage Nodes container, select Scan for Devices, and select All Nodes you want to scan.
Library Configuration: Components
- Drive Properties: NetWorker requires several pieces of information from a device in order to build its associations in the NSR Jukebox configuration object: The serial number and the device handle. These can be manually acquired using the following commands:
cdi_inq -f <tape drive driver handle> -v inquire -lc
If the serial numbers between the inquire and cdi_inq commands, do not match, this is generally evidence of Dynamic World Wide Naming. - Robot Properties: Because the drives and robots are logically separate in their operations, in order to coordinate tape cartridge load operations with device read/write operations, the robot must associate a drive's serial number with the robot's corresponding element address. To acquire these pairings:
sjisn <i.t.l or changer driver handle>
- NSR Storage Node: If configuring the library in the NetWorker in the Management Console user interface, the device detection process adds any discovered drives or robots to the Storage Node resources as 'Unconfigured' devices (orange wrench icons in the user interface). They cannot be deleted as they are not distinct resources, and will be replaced with usable resources after the configuration process.
nsrdb (the folder can be zipped while NetWorker is running)dvdetect -dlv -D9
(when troubleshooting UI detection issues) - NSR Jukebox: Once the 'Unconfigured' library is selected and 'Configure' is run in the user interface, the NSR Jukebox is built using the associations above: element:serial number:device handle, and the other library data collected from the robot such as slot, cartridge, and I/E port displacement.
nsrdb: The folder can be zipped while NetWorker is runningnsrjb: Provides a simpler, human-readable version of the library configurationjbconfig: Can be used to manually configure a jukeboxjbedit: Provides a simpler, human-readable version of the library configuration
Library Configuration: Inhibitors
The following are several basic tests to try once detection and access have been confirmed previously:
- Troubleshooting Tape Library Detection Problems in NetWorker (A Dell Support account is required to view this article)
- Troubleshooting Tape Library Access Problems in NetWorker (A Dell Support account is required to view this article)
- Checking or Deleting NSR Storage Node: There are several properties in the resource which may prevent proper detection and configuration of a jukebox, such as:
- Any of the Unconfig or List of fields
- Skip scsi targets field
- Any of the name or registration fields
The NSR Storage Node resource can be safely deleted by shutting down NetWorker, connecting to the resource database at the command line. Always back up the resource database first, both by creating a bootstrap backup, and by creating a tar/.zip file of the nsrdb folder
cd <nsr/res directory> nsradmin -d nsrdb del type: nsr storage node (and answer yes to the storage node in question)
Manually checking cdi_inq/inquire / sjisn/sjirjc resources. Since automatic configuration of a tape library requires coordination of data from both the drives and the robot, and cross-validating, some of these values, check to see if anomalies appear anywhere in the outputs:
sjirjc <changer address>
Confirm Number of Drives, Number of Import/Export Elements and Number of Slots are expected.
sjisn <changer address>
Compare drive totals to inquire, sjirdtag, and sjirjc totals; compare serial numbers and model strings to inquire output.
sjirdtag <changer address>
Compare drive and slot totals to other output; look for pres_val=0 for drives to indicate problems.
cdi_inq -f <changer driver handle> -v
Compare serial number and model string to inquire and sjisn outputs.
If serial numbers cannot be detected, or the serial strings or drive counts mismatch, the configuration fails.
- Hardware, firmware, or NetWorker code issues: If there are lower-level problems in any of the devices' reporting, or the code which handles them, you can enable debug with the following environment variables, and rerun the commands above (or NSRGet -o:d) to either check for clues, or prepare for escalation:
SJI_DEBUG=9 LUS_DEBUG=9 CDI_DEBUG=9 JBDEBUG=9 SCSI_DEBUG=9
Library configuration - jbconfig (automatic)
- If the library is failing to be detected using the normal UI mechanisms, try using the command jbconfig - this may operate at a semi-devolved level, but still provides almost the same results as the UI (and provides the ability to name the library, which is not present in regular UI configuration).
- Select Option 2 in the jbconfig dialogue to test autodetection and configuration; you are prompted for any shared device handling or NDMP devices - remote hosts and NDMP are not handled automatically, and you must use the sjisn and inquire outputs to provide the host/handle pairings, per element.
Library configuration - jbconfig (manual)
- If jbconfig fails with Option 2 - you can retry with Option 4, and if the type of library does not appear in the list, simply use #54 (standard SCSI jukebox). This option requires all parameters entered manually:
- Library SCSI address or driver file handle, as returned by inquire on the robot control host
- Driver handle for each host:drive element pairing, as per robot-local sjisn output, as compared with an inquire output collected from each Storage Node sharing drives
- Model of drive(s) being configured in the jukebox
- If sjisn and inquire outputs do not reveal serial numbers, then the robot or drives may not support serial numbers; in this case the only remaining option is to empty the library, manually move a single tape cartridge to each drive in succession, and run mt -f <device handle> status until the correct local handle, per host, is found for that drive element. This is rare and unexpected in modern hardware.
If none of the above suggestions help, engage support as appropriate for your Operating System or Library vendor if the evidence collected from the debug suggests any internal anomalies; otherwise, collect the debug output while attempting configuration and escalate the results within NetWorker Support to pursue the possibility of a code defect.