NetWorker: Media Database maintenance and troubleshooting

Summary: This Article describes methods for identifying and treating problems related to the media database, as well as best practices for maintenance and protection.

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Symptoms

  • Failure to start services.
  • Failure of daily Server Protection > Server Backup workflow.
  • Incomplete or misleading information related to save sets or clients returned by mminfo command or when querying or browsing save sets in NetWorker Management Console (NMC) or NetWorker Web User Interface (NWUI) administration.
  • Backup, recover, or cloning issues related to inability to locate save sets, clients, or volumes.
  • Errors in the server daemon log or consoles related to the media database:
nsrmmdbd WiSS code assertion error (st_nextrec: rec loop detected)
nsrmmdbd error, ss_clone_ensure_clone_eligibility: assertion, invalid parameters or code segment
nsrmmdbd XCHK ssid:saveset_short_ssid host:saveset_hostname name:saveset_name has a fragment with an invalid volid:saveset_volid
nsrmmdbd NSR warning WiSS code assertion error (ST_readvdir: directory read failed)
nsrmmdbd NSR critical Unexpected error reading long record directory: an invalid slot number
nsrmmdbd NSR warning partial record error, ssid: saveset_short_ssid saveset_long_ssid flags:0x00010101 size:0 files:0 tm:datetime cloneid
nsrmmdbd NSR notice media db must be scavenged
nsrmmdbd NSR critical media db scavenge failed
nsrmmdbd NSR warning Cannot scavenge path_to_mmvolume6 (Permission denied) - recover from backup media
nsrmmdbd NSR warning Cannot scavenge path_to_mmvolume6 (unknown error code) - recover from backup media
nsrmmdbd MDB warning can't fetch save set <saveset ID>
nsrmmdbd MDB warning Unable to fetch child save set <saveset ID> for cover set <saveset ID>
  • Sudden loss of many save sets from the media database, or a sudden jump in available disk storage free space.
  • Software failing to expire or delete save sets leading to rapid storage consumption.

Cause

Like any database, the media database can be damaged to varying degrees where there is any inference with its normal operations, such as. 
  • Unexpected shutdown of nsrmmdbd process (core dump, system crash, reboot, or power loss).
  • Interrupted transaction (external security software interference or disk space depletion).
  • Logical internal issue (code bug or unhandled conditions).
  • Direct interference with media database files or save set files on NetWorker-managed storage.
The media database is best protected from damage with the following general practices:
  • If possible, use a separate, local disk partition for the nsr/mm folder, which helps protect against conditions like disk space depletion by other processes. This partition should be at least 3x the size of the media database; a large media database currently would be 10 GBs; therefore, 100 GBs should be enough for any installation. 
  • Ensure that the Server Backup workflow is completed daily so that backups of the media database and critical disaster recovery resources (the Bootstrap) are available in the event of a disaster.
  • Verify the location of Bootstraps with the mminfo -B command periodically.
  • Never allow the NetWorker server's storage volumes to be accessed by another NetWorker server concurrently, as this can lead to data loss.
  • If antivirus software is installed on the NetWorker server, create exclusions for the /nsr directory to prevent the antivirus software from scanning, modifying, or removing NetWorker files. 
  • Avoid deleting any files in NetWorker storage manually to attempt to free up space. NetWorker has space reclaim routines which run daily, and Support should be contacted if these are considered to be failing.
  • In general, for Datazone planning, keep data of the same type to the same pools for ease of maintenance when required, such as vProxy save sets, file system save sets, and Oracle database save sets in separate pools.
  • Do not ignore messages related to media database errors - contact Support if you have concerns.
Be aware of NetWorker's media database and storage relationship, and protect volumes with Scan Needed flag
  • NetWorker runs the Expiration process daily as part of the Server Backup workflow. This job calculates retentions and dependencies and expires save sets which are past their retention and have no unexpired dependents. Once this completes, NetWorker attempts to delete all expired disk-volume save sets. Following this, the space reclaim operation runs for each volume, deleting save set files from disk media which do not have corresponding media database entries. This means if the media database becomes corrupt, or you recover the database to a prior point in time, valid data may be deleted.
  • If you feel there is a problem with any disk volumes, to ensure that valid data is not deleted, ensure that the volume is unmounted and marked as Scan Needed. This applies as well to volumes after recovering to a previous point in time (where valid save sets may exist on disk created after the recovery point, and therefore have no entries in the recovered database). 
  • Scan Needed allows normal backup, recovery, and/or cloning, but prevents normal expiration or deletion - so make sure it is used only to protect volumes perceived to be in danger, and removed when returning to regular operations. Volumes must be unmounted to set or remove this flag. It is common for volumes to be marked "scan needed" post NetWorker server disaster recovery (nsrdr), to prevent unwanted data loss in a disaster recovery scenario.

Resolution

There are several ways to attempt to verify and correct media database issues. Before attempting any of these, in order to assess the impacts, create reports before and after to see if save sets, volumes, clients, or anything else have been removed.
At the command line, in a directory to host outputs, run the following commands to compare media database properties before and after the procedure:
  • mminfo -C mminfo-C_pre.mmi
  • mminfo -X mminfo-X_pre.mmi
  • mminfo -ar "volid,type,location,pool,volume,state,volflags,written,savesets" -q family=disk -xc, > mminfo-vol_pre.mmi
Once you complete the maintenance, re-run each to a separate file (for example *_post.mmi) and compare values.

nsrim - Daily Server Protection

Each day the Server Protection > Server Backup workflow runs, and with it the Expiration action. The Expiration action runs nsrimwhich is NetWorker's native maintenance utility. This can also be run directly, but may take anywhere from several minutes to several hours, depending on server load and media database size:
nsrim -X > nsrim.out 2>&1

Unless this process is failing to run daily, this is not unlikely to change anything. Check the daemon log for nsrim daily completion.

Service restart

Restarting the NetWorker services forces various startup checks which may expose problems in daemon log error messages, and potentially correcting any. Before halting services, if database problems appear to be severe, ensure that there is adequate free space available, and bootstrap locations are known (mminfo -B output). Ideally, run nsrmmdbasm -s nsr/mm/mmvolrel_path > mm.xdr first to attempt to extract a current media database copy. Before restarting services, create a copy of the mmvolrel folder as it may be required for forensic or recovery purposes later.

Export and re-import of the media database

This process avoids a full disaster recovery by only extracting viable media database records, and reimporting them to the server without stopping services. However - this should only be done when the server is idle, and should never be attempted with jobs running. Use the full path in place of mmvolrel (which may vary based on installation or operating system)
  1. Before you begin, mark all disk volumes as Scan Needed after unmounting. If Auto Media Management is set for devices hosting disk volumes, you must disable this first. Tape volumes do not need this step. 
  2. Run the mminfo commands described in the preamble to prepare your preliminary reports.
  3. Check the size of the media database mmvolrel folder and record
  4. Ensure none of nsrck, nsrim, nsrmmdbasm processes are running; if there are any large, old, or not recently modified files in mm parent folder like mm[alphanumerics], move or delete if they are not locked by any process.
  5. Run the command to extract the media database: nsrmmdbasm -s mmvolrel > mm.xdr
  6. Compare the size of the new file to the size of the mmvolrel folder - if should be similar in size. If it is tiny (4 B or a handful of KB), the command failed. If it is significantly smaller - there may have been corrupt records removed as part of the process.
  7. Prepare the server to recover its media database by setting the Server's state field to disaster recovery in NMC/NWUI or using nwadmin.
  8. Recover directly from the media database extract file using the nsrmmdbasm command again: nsrmmdbasm -r -2 < mm.xdr
  9. Once complete, run the same mminfo  as described in the preamble and compare save set and written values, per volume, ensuring all volumes are present; likewise mminfo -C values should be identical.
  10. If there are any disparities, take note and carefully consider how to proceed, and contact Support if you are not confident in the results you see:
    • For volumes which appear to be healthy, you may remove the Scan Needed flag and mount the volumes, as there should be no danger of save set deletion if no save sets appear to have been removed from the volume if the save sets and written values are consistent. 
    • Volumes which show fewer save sets or a lower written total should leave the Scan Needed flag in place and run scanner: scanner -i devicename to reintroduce files found on the volume which no longer have records. Once scanner has completed for each volume, check the save set count again, and remove Scan Needed flag. Remount volume once you are confident scanner has replaced save sets expected to be missing.

nsrdr

The full disaster recovery performed by nsrdr recovers not only the media database, but also other server elements like the resource database and jobs database. See the Server Disaster Recovery and Availability Best Practices Guide for your version before attempting to proceed with this.
This command expects the Storage Nodes to be online and contactable in order to complete.
 
NOTE: Always contact Support if there are any questions or concerns, as recovery of the media database can lead to data loss for disk volumes if Scan Needed flag is not used to protect file systems with viable save sets which may not have media database records due to corruption or recovery to a previous point in time.

Affected Products

NetWorker

Products

Data Backup & Protection Software, NetWorker Family
Article Properties
Article Number: 000223518
Article Type: Solution
Last Modified: 08 Apr 2024
Version:  1
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.