PowerFlex 3.X: Node Deployment Fails With RED033 Error When Sending An IDRAC Reboot
Summary: During the PFxM automated build of a Storage Only node (it could also happen to an HCI and CO node as well), the OS gets installed. During the first reboot cycle of the node. The automation for the build times out and fails to complete. The Deployment/Job goes into a failed state. ...
Symptoms
PowerFlex Manager automated node build is failing after OS is installed. IDRAC inventory works as expected. OS image gets installed, but then times out on the first reboot request with this error in the PowerFlex Manager deployment exception.log file for the job id:
Log Files
/opt/Dell/ASM/deployment/{serviceuuid}/deployment.log
/opt/Dell/ASM/deployment/{serviceuuid}/rackserver-xxxxxxx_exception.log
The key evidence is the RED033 message ID coming from IDRAC. The job fails to complete which results in deployment failure.
Ex.
Reboot Failed, job_until_time: TIME_NA, message_arguments: NA, message_id: RED033, name: Reboot3, percent_complete: 0]>/opt/jruby/9.1.17.0/lib/ruby/gems/shared/gems/dell-asm-util-0.1.0/lib/asm/wsman.rb:1282:in `poll_lc_job' /opt/jruby/9.1.17.0/lib/ruby/gems/shared/gems/dell-asm-util-0.1.0/lib/asm/wsman.rb:1154:in `reboot' /opt/asm-deployer/lib/asm/provider/server/server.rb:1275:in `power_on!' /opt/asm-deployer/lib/asm/type/server.rb:1541:in `boot_os_iso_installer' /opt/asm-deployer/lib/asm/type/server.rb:1679:in `provision_server!' /opt/asm-deployer/lib/asm/service_deployment.rb:5163:in `process_server_with_types' /opt/asm-deployer/lib/asm/service_deployment.rb:2949:in `process_server' /opt/asm-deployer/lib/asm/service_deployment.rb:1409:in `block in create_component_thread'
When the node is in this state it does not even respond to POWER ON | POWER OFF commands over the console, it also does not respond to a physical push of the power-on button on the chassis.
Various attempts to upgrade and downgrade Firmware/BIOS/Lifecycle firmware produced the same results. iDRAC reboot and server power drain do not provide a resolution.
Impact
The server is unable to be built and added to the PowerFlex Manager service.
Cause
iDRAC commands are not being processed correctly by the mother board. iDRAC Fails with RED033 error. Job execution from iDRAC to motherboard/system board timeout.
Resolution
This issue is related to a problem with the motherboard/system board. The corrective action is to work with PowerFlex support to work order a replacement component