Unsolved

2 Intern

 • 

143 Posts

3377

January 21st, 2021 18:00

Issues with reboot after firmware or driver updates

Specific to iDRAC or out-of-band servers, after an update that requires a reboot, if I select 'Graceful reboot without forced shutdown', the job times out. Apparently unable to reboot the server. If I select 'Graceful reboot with forced shutdown', the servers log an unexpected shutdown.

What is the recommended method to allow servers to reboot cleanly without an unexpected shutdown?

4 Operator

 • 

3K Posts

January 21st, 2021 20:00

Also share the server model and iDRAC FW version  as well.

4 Operator

 • 

3K Posts

January 21st, 2021 20:00

Which operating system did you have on the server. Sometime we see Graceful shutdown will not work properly if you OS not in use and is in locked state. This due to operating system configuration. Can you try graceful shutdown from iDRAC when OS screen not in a locked state.

2 Intern

 • 

143 Posts

January 22nd, 2021 08:00

Thanks for the reply.

I've tested on three and they all behave the same:

PowerEdge 340 and 540 both running Server 2019 and latest available iDRAC FW 4.40.00.00.

PowerEdge 430 running Server 2012 R2 and latest available iDRAC FW 2.75.75.75.

None of these are or were left in a locked/logged on state.

I should mention that this was never an issue in OpenManage Essentials, but in that scenario the servers were all discovered and managed in-band, not via iDRAC.

2 Intern

 • 

131 Posts

January 23rd, 2021 15:00

https://www.dell.com/community/Dell-OpenManage-Enterprise/Reboot-after-update-no-graceful-reboot/m-p/7665853#M3044 

see POST above.  I believe because the IDRAC isn't communicating with the OS in terms of credentials due to protocols being used.  OM Essentials was using SNMP / WMI to communicate with OS where as iDRAC is using WSMAN.  I believe they state to use SSH but you would have to deploy and configure a SSH to each windows server.

One work around, if you call it that, is to pick the option to apply package updates but NOT reboot and then reboot those servers manually POST driver/fw install which gets rid of some of the automated task function....

I've made many posts to request Dell Engineering to look at OM Enterprise to try and provide some of the same functionality as what we had in Dell OM Essentials (including alerts) but nothing so far...this product is NO WHERE near what Dell OM Essentials was (my 2 cents anyway).

Moderator

 • 

5.3K Posts

January 24th, 2021 19:00

Hi, how about we use Inband instead of OOB (Out of Band)

We can reinstall OMSA and run the test again.

 

2 Intern

 • 

131 Posts

January 25th, 2021 07:00

that would require OPEN SSH client on the server

2 Intern

 • 

143 Posts

January 25th, 2021 09:00

Yes, this worked perfectly in-band and with OMSA under OM Essentials.

However, as @Yellow Boy mentioned under OM Enterprise, we would also need ssh on all servers. In addition, the in-band functionality is limited, as those devices show up as 'Managed' as opposed to 'Managed with Alerts'. We also apparently lose the ability to update firmware with in-band management.

2 Intern

 • 

143 Posts

January 25th, 2021 17:00

I've had successful graceful reboots with two PowerEdge R740xd's using out-of-band/iDRAC.

Both have iDRAC 9 with firmware version 4.40.00.00. Both running Windows 2019. Both have OMSA installed, neither have ssh installed.

I'm not sure what's lending to the inconsistencies. Any other thoughts?

2 Intern

 • 

131 Posts

January 26th, 2021 06:00

@justin gray how is your discovery for that IDRAC setup?   What options are checked?

did you try this on a 12 or 13th gen server and older OS?

2 Intern

 • 

143 Posts

January 26th, 2021 08:00

@Yellow Boy I'm using basic iDRAC discovery: Device Type = Server, enter IP address, choose 'Discover using WSMAN/Redfish (iDRAC, Server and/or Chassis)' and enter iDRAC credentials (non-domain).

Non of these have ssh installed. OMSA is still installed on all of them. I've observed the following so far:

  • PowerEdge 340, Server 2019, iDRAC9 firmware v.4.40.00.00 = graceful reboot times out, forced reboot causes dirty shutdown.
  • PowerEdge 540, Server 2019, iDRAC9 firmware v.4.40.00.00 = graceful reboot times out, forced reboot causes dirty shutdown.
  • PowerEdge 430, Server 2012 R2, iDRAC8 firmware v.2.75.75.75 = graceful reboot times out, forced reboot causes dirty shutdown.
  • PowerEdge R740xd, Server 2019, iDRAC9 firmware v.4.40.00.00 = graceful reboot is successful on two servers tested so far.

2 Intern

 • 

131 Posts

January 26th, 2021 12:00

perhaps this is only allowed currently with 14th Gen servers?  My question is HOW is this interacting with the OS to do a graceful shutdown on 14th Gen but not the others with out-of-band as this is NOT using domain cred but local iDRAC creds?

2 Intern

 • 

143 Posts

January 27th, 2021 08:00

Great question. I'm curious to know, as well.

I was hoping someone from Dell with some technical knowledge would jump in and clear up some of these mysteries...

Moderator

 • 

2.9K Posts

January 28th, 2021 05:00

Hello,

 

OpenManage Enterprise 3.3.1 and older performs a power cycle rather than a graceful reboot after a specified timeout, can utilize the "Stage for next server reboot." to get around this. And I guess for now best way is this workaround.

 

Normally, OME 3.4 and newer allows configuring graceful reboot either with or without a forced shutdown. But I can see it such a similar issue goes on newer versions. I've done a lot of investigating but haven't been able to find a direct solution for this or find the root cause that led to it.

 

by the way, regarding OpenSSH, OME 3.5 release notes, attention is drawn to the following. https://dell.to/2Mv1ww0  

Limitations

Only the OpenSSH is supported for the discovery and inventory collection of Windows-based servers and Hyper-Vs. Other SSH protocol implementations, like Cygwin SSH, are not supported. [157991]

 

Knowing issue

Description: In-band driver updates are only supported on Windows with OpenSSH. Driver updates on third party SSH hosted on Windows, such as the CygwinSSH, are not supported. [157887]

 

2 Intern

 • 

131 Posts

January 28th, 2021 08:00

@justin gray @Dell-ErmanO 
Erman,

We new about the limitations and there are many post from customers stating how they are displeased by the design of this as many customers can't install ssh clients on windows servers, etc.  Didn't have this issue in OM Essentials because it was supported.  I believe many customers have asked for this functionality to be added to OM Enterprise.

To that point it doesn't make sense why Justin's 740xd example was able to gracefully shut down and reboot because he is not running a ssh client and tested in-band so how is that explained?

by the way, regarding OpenSSH, OME 3.5 release notes, attention is drawn to the following. https://dell.to/2Mv1ww0 

Limitations

Only the OpenSSH is supported for the discovery and inventory collection of Windows-based servers and Hyper-Vs. Other SSH protocol implementations, like Cygwin SSH, are not supported. [157991]

 

Knowing issue

Description: In-band driver updates are only supported on Windows with OpenSSH. Driver updates on third party SSH hosted on Windows, such as the CygwinSSH, are not supported. [157887]

2 Intern

 • 

143 Posts

January 29th, 2021 08:00

In my environment, I'm using the latest build of OMEnt. 3.5, clean/new install. All servers are still running OMSA, none are running SSH. Operations are performed via iDRAC/out-of-band.

I don't have a huge base to test, nor an abundance of time, but what I'm finding so far is that those that reboot gracefully pass the commands below. Those that time out never seem to have received or passed those commands. The question, of course, is why?

Here's what is logged:

Log Name: System
Source: Microsoft-Windows-Kernel-Power
Event ID: 109
Task Category: (103)
Level: Information
Keywords: (70368744177664),(1024),(4)
User: SYSTEM
Computer: xxxxxx
Description:
The kernel power manager has initiated a shutdown transition.

Shutdown Reason: Button or Lid
-------------------------------------------------------

Log Name: System
Source: User32
Event ID: 1074
Task Category: None
Level: Information
Keywords: Classic
User: SYSTEM
Computer: xxxxxx
Description:
The process C:\Windows\system32\winlogon.exe (xxxxxx) has initiated the power off of computer xxxxxx on behalf of user NT AUTHORITY\SYSTEM for the following reason: No title for this reason could be found
Reason Code: 0x500ff
Shutdown Type: power off

No Events found!

Top