Skip to main content
  • Place orders quickly and easily
  • View orders and track your shipping status
  • Enjoy members-only rewards and discounts
  • Create and access a list of your products
  • Manage your Dell EMC sites, products, and product-level contacts using Company Administration.

Article Number: 000170183


IDPA: IDPA deployment for DP8800 fails at 63% with error "Failed: Configuring Protection Storage. Error: Failed to enable file system of data domain after reboot"

Summary: This issue is seen on IDPA DP8800 appliances running code 2.3 which comes with Data Domain version 6.2.0.5. This Data Domain version has a bug which causes false positive alerts on Data Domain regarding Storage processor failed causing the Data Domain file system to go down. ...

Article Content


Symptoms



The ACM UI shows the following Error for failed deployment:    
kA23a000000GHJZCA4_3_0

The Diagnostic report shows the following error:    
kA23a000000GHJZCA4_3_1

It is noted that Data Domain deployment reaches 98% and then Data domain reboots as a part of the workflow. 

After this reboot, Data Domain File System does not come up.

There may be hardware errors seen on the Data Domain regarding Storage Processors:    
sysadmin@xxxxxxxxxxxx# alerts show current
Id      Post Time                  Severity   Class             Object        Message
-----   ------------------------   --------   ---------------   -----------   -------------------------------------------------------
p0-67   Mon Jan  6 12:15:07 2020   CRITICAL   HardwareFailure   Enclosure=1   EVT-ENVIRONMENT-00032: The storage processor has failed
-----   ------------------------   --------   ---------------   -----------   -------------------------------------------------------
There is 1 active alert.

The ACM server.log shows the following error message:    
2020-01-06 18:43:16,694 INFO  [pool-67-thread-3]-util.SSHUtil:  STDOUT   : [The filesystem has encountered a problem.]
2020-01-06 18:43:16,694 INFO  [pool-67-thread-3]-util.SSHUtil:  STDERR   : []
2020-01-06 18:43:16,694 INFO  [pool-67-thread-3]-util.SSHUtil: Successfully executed remote command using SSH.
2020-01-06 18:43:16,694 INFO  [pool-67-thread-3]-ddadapter.ConfigDataDomainTask: Successfully executed: filesys status
2020-01-06 18:43:16,694 ERROR [pool-67-thread-3]-ddadapter.ConfigDataDomainTask: File system is not enabled or not running after rebooting

Cause

Systems Affected: IDPA DP8800 
Affected IDPA Component: Data Domain

Data Domain on IDPA DP8800 appliance, is experiencing memory subsystem and SP failure alerts, leading to system reboot. In most instances, the DD9800 which is part of IDPA DP8800 appliance, will produce a combination of hardware alert messages, which can falsely indict multiple hardware components. It is typical to see a combination of the following hardware alerts. This article is intended to help troubleshoot this issue when it occurs, and also apply a workaround to disable the background memory scrubbing function to prevent random reboots. These hardware alerts tend to come up after system reboots. 
sysadmin@xxxxxx# alerts show current

Id       Post Time                  Severity   Class             Object                         Message
------   ------------------------   --------   ---------------   ----------------------------   -----------------------------------------------------------------------------------------
p0-279   Thu Mar 15 14:27:33 2018   CRITICAL   HardwareFailure   Enclosure=1:Slot=0             EVT-ENVIRONMENT-00029: I/O module has failed
p0-280   Thu Mar 15 14:27:52 2018   CRITICAL   HardwareFailure   Enclosure=1:Slot=1             EVT-ENVIRONMENT-00029: I/O module has failed
p0-281   Thu Mar 15 14:27:54 2018   CRITICAL   HardwareFailure   Enclosure=1:Slot=2             EVT-ENVIRONMENT-00029: I/O module has failed
p0-282   Thu Mar 15 14:27:55 2018   CRITICAL   HardwareFailure   Enclosure=1:Slot=3             EVT-ENVIRONMENT-00029: I/O module has failed
p0-283   Thu Mar 15 14:27:56 2018   CRITICAL   HardwareFailure   Enclosure=1:Slot=4             EVT-ENVIRONMENT-00029: I/O module has failed
p0-284   Thu Mar 15 14:27:57 2018   CRITICAL   HardwareFailure   Enclosure=1:Slot=5             EVT-ENVIRONMENT-00029: I/O module has failed
p0-285   Thu Mar 15 14:27:59 2018   CRITICAL   HardwareFailure   Enclosure=1:Slot=6             EVT-ENVIRONMENT-00029: I/O module has failed
p0-286   Thu Mar 15 14:28:00 2018   CRITICAL   HardwareFailure   Enclosure=1:Slot=7             EVT-ENVIRONMENT-00029: I/O module has failed
p0-287   Thu Mar 15 14:28:01 2018   CRITICAL   HardwareFailure   Enclosure=1:Slot=8             EVT-ENVIRONMENT-00029: I/O module has failed
p0-288   Thu Mar 15 14:28:02 2018   CRITICAL   HardwareFailure   Enclosure=1:Slot=10            EVT-ENVIRONMENT-00029: I/O module has failed
p0-289   Thu Mar 15 14:28:04 2018   CRITICAL   HardwareFailure   Enclosure=1:DIMM=MR4 DIMM A1   EVT-DIMM-00003: A memory card has failed
p0-290   Thu Mar 15 14:28:05 2018   CRITICAL   HardwareFailure   Enclosure=1:Riser=4            EVT-ENVIRONMENT-00044: Memory riser fault has been detected
m0-30    Thu Mar 15 06:48:33 2018   WARNING    Filesystem                                                          EVT-GC-00002: Unable to start scheduled file system cleaning on Thu Mar 15 06:01:00 2018.
p0-154   Thu Mar 15 15:49:08 2017    INFO             Filesystem                                                          EVT-FILESYS-00012: System rebooted 

p0-318    Fri Mar 16 09:47:16 2018    CRITICAL     HardwareFailure    Enclosure=1:Riser=4          EVT-ENVIRONMENT-00044: Memory riser fault has been detected
p0-277   Thu Mar 15 12:04:47 2018   CRITICAL     HardwareFailure   Enclosure=1                       EVT-ENVIRONMENT-00032: The storage processor has failed "voltage is faulty"

------   ------------------------   --------   ---------------   ----------------------------   -----------------------------------------------------------------------------------------

The bios.txt log on Data Domain shows the following errors:    
   1 | 03/15/2018 | 05:33:39 | SMI Critical Interrupt Events Enter_SMI | SMI Critical Interrupt | Asserted | Used AUX Log (LSB 0x0) Used AUX Log (MSB 0x0)
   2 | 03/15/2018 | 05:33:41 | CPU Status Events CPU2_Status | CPU IERR | Asserted |  CPU External IERR
   3 | 03/15/2018 | 05:33:41 | Entering IERR Interrupt Events Enter_SMI | IERR Interrupt | Asserted | Used AUX Log (LSB 0x24) Used AUX Log (MSB 0x0)
   4 | 03/15/2018 | 05:33:42 | BMC Chassis Ctrl Events BMC_Chassis_Ctrl | Reset through BMC | Asserted

To troubleshoot this scenario, focus on the CPU Patrol Scrub function on IDPA Protection Storage (Data Domain), which can incorrectly report memory DIMMS are faulty, and can also indict the incorrect DIMM.

SP (storage processor) replacements, Mech Replacements, and mass memory replacements, have all been proven unnecessary in resolving this problem.

Resolution

Steps to resolve this Issue on IDPA Data Domain:    

Method 1:    
  1. Disable Patrol Scrub on the Data Domain System part of IDPA. (See Notes sections for steps).
  2. Clear Indict List on Data Domain:    
se indict list
se indict remove <id>
  1. Clear Data Domain Active Alerts related to Hardware Failures:    
alerts show current
alerts clear <alert-id>
  1. Perform a system reboot to confirm the Data Domain File system Comes up.
  2. Hit 'Retry' on the ACM UI and retry the deployment if the Data Domain File system comes up clean as per step 4. 

Method 2:    
  1. Upgrade to fixed DDOS version 6.2.0.30. 

  2. Hit 'Retry' on the ACM UI and retry the IDPA deployment.

Additional Information

This content is translated in different languages: 

https://downloads.dell.com/TranslatedPDF/AR-SA_540157.pdf
https://downloads.dell.com/TranslatedPDF/DE_540157.pdf
https://downloads.dell.com/TranslatedPDF/ES_540157.pdf
https://downloads.dell.com/TranslatedPDF/ES-XL_540157.pdf
https://downloads.dell.com/TranslatedPDF/FR_540157.pdf
https://downloads.dell.com/TranslatedPDF/IT_540157.pdf
https://downloads.dell.com/TranslatedPDF/JA_540157.pdf
https://downloads.dell.com/TranslatedPDF/KO_540157.pdf
https://downloads.dell.com/TranslatedPDF/NL_540157.pdf
https://downloads.dell.com/TranslatedPDF/PT_540157.pdf
https://downloads.dell.com/TranslatedPDF/PT-BR_540157.pdf
https://downloads.dell.com/TranslatedPDF/RU_540157.pdf
https://downloads.dell.com/TranslatedPDF/SV_540157.pdf
https://downloads.dell.com/TranslatedPDF/ZH-CN_540157.pdf
https://downloads.dell.com/TranslatedPDF/ZH-TW_540157.pdf



Workaround:    
In DDOS directory /ddr/firmware/JUPITER there is a utility to flash the DD9800 BIOS configuration settings.

Use the following command to dump current BIOS setting into an ASCII file:    

./SCELNX_64 /o /s ./ps_enabled.txt

This will generate a new text file named ps_enabled.txt.

View ps_enabled.txt with VI text editor.

Search down the text file for the word "Patrol"
Note: The asterisk (*) next to the Option =*[01]Enable  means the Patrol Scrub function is Enabled. 

                Setup Question  = Patrol Scrub
                      Options =*[01]Enable    // Move "*" to the desired Option
                               [00]Disable

*******************************

With a VI text editing tool, change the Patrol Scrub setting to disable as shown below. Delete the asterisk next to Enabled, and enter an asterisk next to Disabled.

                Setup Question  = Patrol Scrub
                     Options =[01]Enable    // Move "*" to the desired Option
                             *[00]Disable

Write and save (:wq) the changed to a new file named ps_disabled.txt

*******************************

In folder /ddr/firmware/JUPITER, there should now be two BIOS configuration files

                    ps_enabled.txt
                    ps_disabled.txt

*******************************

Load the edited config file ps_disabled.txt into BIOS:    

./SCELNX_64 /i /s ./ps_disabled.txt

Note: The following errors may be seen and they can be ignored:

Example:

!!!!xxxxxxxxYOUR DATA IS IN DANGER !!!! #  ./SCELNX_64 /i /s ./ps_disabled.txt

----------------------------------------------------------------------------
|                Copyright (c)2014 American Megatrends, Inc.               |
|                      AMISCE Utility. Ver 5.01.1073                       |
----------------------------------------------------------------------------

Warning in line 23600
Missing Current Setting "*"
WARNING : Length of string for control (User Name) not updated as the value/defaults specified in the script file doesn't reach the minimum range (1).
WARNING : Length of string for control (User Name) not updated as the value/defaults specified in the script file doesn't reach the minimum range (1).
WARNING : Length of string for control (User Name) not updated as the value/defaults specified in the script file doesn't reach the minimum range (1).
WARNING : Error in writing variable PNP0501_0_NV to NVRAM
WARNING : Error in writing variable SecureBootSetup to NVRAM

Import completed with some errors, see warnings given.
*******************************

Reboot system with DDOS command: 

#system reboot

This will force the new BIOS configuration settings into BIOS during the reboot.

*******************************

Once system is rebooted, from BASH shell, verify the Patrol Scrub setting has changed to disabled:    

./SCELNX_64 /o /s ./bios_config.txt

Using VI text editor, search and verify the file 'bios_config.txt' to check if Patrol Scrub function maintained the asterisk next to disabled after the reboot.

                 Setup Question  = Patrol Scrub
                      Options =[01]Enable    // Move "*" to the desired Option
                              *[00]Disable

Now, the Patrol Scrub function has been permanently disabled after this workaround is applied. Future OS and BIOS versions will incorporate this change automatically. 

Upgrading to a newer IDPA version will be seamless, and does not require removal of this workaround.

Article Properties


Affected Product

Integrated Data Protection Appliance Family

Product

PowerProtect DP8800, PowerProtect Data Protection Software, Integrated Data Protection Appliance Family, PowerProtect Data Protection Hardware, Integrated Data Protection Appliance Software

Last Published Date

23 Dec 2020

Version

3

Article Type

Solution