NetWorker Troubleshooting Guide: Process Crashes and Core Dumps
Summary: Dell NetWorker Comprehensive Guide to Troubleshooting Process Crashes and Core Dumps
Symptoms
NetWorker Troubleshooting Guide: Process Crashes and Core Dumps
Video: Dell NetWorker Comprehensive Guide to Troubleshooting Process Crashes and Core Dumps
Watch on YouTube
Cause
Resolution
NOTE: Before troubleshooting and diagnosing a core dump on your system, search the Dell support site for articles specific to the process which core dumped. In some scenarios, there is a known fix posted. If no fixes are identified, proceed with the steps outlined in this article. Each step provides instructions or a link to a document in order to eliminate possible causes and take corrective action as necessary. The steps are ordered in the most appropriate sequence to isolate the issue and identify the proper resolution. Do not skip a step.
Step 1: Gathering Information - Problem Description:
- Under what circumstances does the process crash. Is this behavior consistent?
- Did this work better before?
- Times of occurrences and observed behavior trend?
- Does the issue happen only at times of heavy load on the backup environment or backups or a particular type of backup group.
- When did the issue first occur? What changed then?
- What is the scope of the issue (all clients/some clients, all backup targets or some)
- What has been tried so far to fix and what conclusions have been drawn from this.
Step 2: Gathering Information - Environment
Which NetWorker process is being unresponsive and on which machine (Server, Storage Node or Client).
- NetWorker server version and platform: NetWorker: Methods for Identifying NetWorker Software version
- Overview of the size and nature of the backup datazone
- Target media for these backups
Step 3: Supportability
- Using the online NetWorker Compatibility Guide (Requires Dell Support Account Sign-In), check that all components (NetWorker server, file system version, proxy, storage nodes, clients, target) are supported.
- Check that there is no underlying Operating System or hardware deficiency that would account for the process crashes (disk failures, disk full, network errors and so forth).
Step 4: Best Practices
The NetWorker Performance Optimization Planning Guide lists software and hardware requirements that support an optimally tuned NetWorker environment. This must be reviewed to be sure that the best practices are being followed. This is relevant if the process being unresponsive is happening at times of heaviest load. NetWorker Documentation is available through: Support for NetWorker | Drivers & Downloads
Step 5: Component Isolation:
How we go about finding the root cause of process being unresponsive issue depends on the behavior as defined in Step 1. If the trigger is unknown, tests can be carried out to try to establish what is triggering the crash:
- Monitor system performance under heavy load
- Examine the Operating System log files around the time of the crashes for commonality in behavior.
- Linux:
/var/log/messages - Windows: System and Application Event Logs
- Linux:
- Review NetWorker logs to see what operations are occurring when the core dumps occur, and when:
- Linux:
/nsr/logs/daemon.raw - Windows (Default):
C:\Program Files\EMC NetWorker\nsr\logs\daemon.raw - NetWorker: How to use nsr_render_log to render .raw log files
- NetWorker host-specific processes are defined in: NetWorker Processes and Ports
- Linux:
- Find out what non-NetWorker operations run on this machine that could affect its behavior and whether their schedule correlates with the times of crashes.
- If the crash occurs consistently, change some parameters to try to narrow down the cause. For example, backing up to a different target media or backing up different types of data from the same NetWorker client
Step 6: Resolution
A core dump is a file that captures a process’s working memory at a specific moment, usually when the program terminates abnormally. A core dump helps identify why a process becomes unresponsive by revealing the functions executing and the data in use at the moment of failure.
- Check the
/nsr/coresdirectory for recent core dumps of NetWorker processes in UNIX or Linux or check the crash directory as defined in the Windows registry (see step 2). - If there is none, check that the Operating System is set up to generate core dump files if there is a process crash. See Operating System Documentation for full details, but in brief, this involves changing
ulimit -cand-fvalues in UNIX and Linux, and making a registry change in Windows. - Operating system tools such as
gdb(UNIX and Linux) andWinDbg(Windows) can be used to assess the core dump. Refer to OS vendor documentation on these functions. - For NetWorker support review, see: NetWorker: How To Use pkgcore for generating core dump bundles
Additional Information
When engaging NetWorker support on core dump related cases, the information collected from following this article must be provided. An NSRGET bundle and the core dump pkgcore bundles must also be provided.