Start a Conversation

Solved!

Go to Solution

816

January 24th, 2023 11:00

PowerEdge R720 BIOS Update to 2.9.0 causes Windows reboot loop

System Model: PowerEdge R720
Operating System: Windows Server 2012 R2
BIOS Version: 1.6.0
iDRAC7 Firmware Version: 1.66.65 (Build 07)
Lifecycle Controller Firmware: 1.1.5.165

Start of following events: 2023-01-22 17:34

Used Lifecycle to execute Dell Update Package BIOS_8P8WX_WN32_2.9.0.EXE and received this Lifecycle error
SUP0525: Unable to verify the digital signature of the Update Package.

Used Boot Manager (F11) to update BIOS using R720-020900C.efi and received this Lifecycle log entry
PR36: Version change detected for BIOS firmware. Previous version:1.6.0, Current version:2.9.0

On the subsequent reboot, system successfully POSTS without errors, but after Windows logo appears briefly, the system reboots. This Windows reboot loops over and over at the same point each time.

The Lifecycle log entry below is seen after every reboot attempt. This Lifecycle log entry only started appearing after the successful BIOS update.
VLT0304: CPU 1 M23 VDDQ PG voltage is outside of range.

iDRAC System Summary shows all voltages (including both CPUs) as green checkmarks, so I suspect the Lifecycle event VLT0304 may be a false alarm.

I suspect the initial BIOS Update Package install failure may have left things in a bad state. This bad state still persists even after the subsequent Boot Manager BIOS update was successful. Later, I used Boot Manager to downgrade the BIOS back to 1.6.0, but the Windows reboot loop remained.

Then, I was able to (F8) boot into Windows Safe Mode with Networking. This is the point I am at now. Does anyone have any ideas?

3 Posts

February 9th, 2023 15:00

Through trial and error, I managed to find the cause of the reboot loop on my server.

When I updated the BIOS to 2.9.0, I suspect the System Profile option was reset to the default.
The default System Profile option is "Performance Per Watt Optimized (DAPC)", where DAPC stands for Dell Active Power Controller.

I changed the System Profile option to "Performance" and the reboot loop disappeared.
Also, the LifeCycle log message "VLT0304: CPU 1 M23 VDDQ PG voltage is outside of range." disappeared (I had rightly suspected this message was a false alert caused by the underlying root problem).

I have not confirmed this, but my theory is that DAPC monitors power consumption (in watts and voltage?) of system components in order to apply a System Profile of "Performance Per Watt Optimized (DAPC)". Either the monitoring revealed a true "voltage is outside of range" condition on CPU2, or it is a false alert. In either case, the System Profile of "Performance Per Watt Optimized (DAPC)" is the cause of the reboot loop problem.

Changing the BIOS System Profile option to "Performance" eliminates the reboot loop problem and the server starts normally.

(Though I solved this problem myself, I am grateful for the advice of others who replied to my plea for help. Thank you. I hope the details of my adverse event may be of use to others.)

Moderator

 • 

3.7K Posts

January 24th, 2023 20:00

Hello, Chris' answer here may help: https://dell.to/3HtVJ3H

4 Operator

 • 

3K Posts

January 24th, 2023 20:00

You can try to boot the server with Minimum to POST configuration to see any H/W is creating the issue. Minimum to POST configuration is the configuration that has the minimum components required to complete POST. Typically, the minimum to POST configuration is PSU1, CPU1, memory module in A1 slot, and the default riser without expansion cards.

3 Posts

January 24th, 2023 22:00

Thank you for recommending the "Minimum to POST" test. I am a one-man IT department and am short on time, so I am looking for advice on what things I can rule out. For instance, I am able to boot into Windows Safe Mode (with Networking). And the reboot loop starts only after the Windows logo appears. And Windows OS attempts to boot only after POST. So, don't these observations confirm that POST is successful and I shouldn't have to perform a "Minimum to POST" test?

I am curious as to why the BIOS downgrade back to the original version did not get rid of the reboot loop?

In the meantime, I tried different BIOS versions and now it is at 2.7.0. I am still able to boot into Safe Mode (with Networking) and performed 2 "sfc /scannow" commands; the first fixed some corrupted files, the second was clean. After this, the Windows reboot loop still occurs.

I think I should be looking at boot (MBR) next? And perhaps trying a Windows "Clean Boot"?

I should mention that there is a RAID array with 12 TB of business-critical data on this server. And also that I have very little experience with servers. So removing a CPU for a test is daunting for me and I am very leery of doing anything that might risk data loss (by messing up the RAID array).

No Events found!

Top