Symptoms
Precheck of OEM is failing:
Errors:
Oem Package Update
Perform Oem Extension update.
BeforeOEMUpdate
Tasks before OEMUpdate.
Test Firmware PreUpdate
Determine if OEM Firmware should be updated on physical nodes.
Firmware Check Health.
Check firmware health before update.
Click
"Download Summary".
Error.json:
Type ‘FirmwareCheckHealth’ of Role ‘OEM’ raised an exception:\n\nError running Test-OEMFirmwareHealth: iDRAC: 10.224.190.3 has Critical Major faults, could not proceed with Firmware update….
Cause
Three of the four servers had memory errors.
Since these were warnings, customer thought they could proceed without resolving first.
The memory health monitor feature has detected a degradation in the DIMM installed in DIMM_A6. Reboot system to initiate self-heal process.
Resolution
To resolve this issue, resolve the memory errors.
- Drain the node with the iDRAC showing the failed memory.
- Once drained, run the following commands in a PEP powershell window.
-
get-virtualdisk -cimsession "s-cluster"
get-virtualdisk -cimsession "s-cluster" | get-storagejob
- Once all storage jobs complete, cold boot the server from the iDRAC.
- Verify that the DIMM has healed:
- The self-heal operation successfully completed at DIMM DIMM_A6.
- In the Azure Stack window, Start the node.
- In the Azure Stack window, Resume the node.
Integrated System for Microsoft Azure Stack Hub, Integrated System for Microsoft Azure Stack Hub 14G