OneFS error: "Could not recover journal. Contact Customer Support Immediately"
Summary: Upon booting, a node fails to fully start up and a Journal error message is displayed.
Symptoms
Upon booting, a node fails to fully start up and the following error message is displayed: "Could not recover journal. Contact Isilon Customer Support Immediately."
A node does not boot successfully.
One of the following messages might be displayed at the Command-line interface during the boot sequence of the node.
Bad version: 0
Unable to save journal (isi_dumpjournal -s returned 1) Error output: Writing v4/v5/v6 journal type sector bytes ----- ---------- ---------- Bad version: 0 isi_dumpjournal: Bad v4 superblock sect 0x0 Unable to save journal /var/journal/journal: Restore failed !! The journal test has encountered a problem. !! Could not recover journal. Contact Isilon Customer Support !! immediately. !! Type 'prompt' to drop to a shell.
Test Journal exited with error - Checking Isilon Journal integrity... The journal state is internally consistent, but the journal state does not match the node and cluster state. Journal : cluster guid 000000000000000000000000000000000000, device id 0. Disk: cluster guid 0007430733ba99650a52911df09ba81 ca8bc, device id 4. Contact Isilon Customer Support immediately. Please contact Isilon Customer Support at 1-877-2-ISILON. Command Options: 1) Enter recovery shell 2) Continue booting 3) Reboot
DRAM journal is invalid.
Test Journal exited with error - Checking Isilon Journal integrity... Attempting to save journal to default location Warning: /etc/ifs/journal_bad exists. Saving bad journal. OneFS is unmounted Writing journal_backup_valid : backup in progress Writing journal_backup_serial : unmounted backup at Sun Oct 20 08:36:31 2013 Saving journal to /var/journal/journal.gz Unable to save journal (isi_dumpjournal -s returned 1) Error output: data 422768 8192 data 787792 8192 data 617904 8192 Bad flags: 0xe01d isi_dumpjournal: Bad v4 txn block sect 0x1a57f Unable to save journal /var/journal/journal: DRAM journal is invalid, initiating recovery... Attempting to save and restore journal to clear any ECC errors in unused DRAM blocks... Restore failed Could not recover journal. Contact Isilon Customer Support immediately. Please contact Isilon Customer Support at 1-877-2-ISILON. Command Options: 1) Enter recovery shell 2) Continue booting 3) Reboot
Cause
Errors were detected in the NVRAM journal, and the node was unable to restore or locate a journal backup. Journals track file system operations before they are committed to disk. During a normal shutdown or reboot of the node, the NVRAM contents are backed up to disk.
During a node reboot, OneFS validates the NVRAM journal. If the NVRAM journal does not contain valid data, the node attempts to restore the journal backup from the boot disk. If the backup is also invalid, the node stops booting and displays an error message.
Resolution
The solution procedure varies depending on the OneFS protection level and the number of offline devices in the cluster. For more information about protection level, see the "Data protection overview" section of the OneFS Administration Guide.
- If the number of offline devices in the cluster is below the node protection level, the node with the inconsistent journal must be Smartfailed from the cluster. Begin SmartFailing the node, and contact Dell Technical Support for additional assistance and information collection.
CAUTION: Do not reformat the node. Reformatting the node reduces the amount of information available to Dell Technical Support for diagnosis.
- If the number of offline devices in the cluster is above the node protection level, data may be unavailable or lost. Contact Dell Technical Support immediately.