Start a Conversation

Unsolved

T

3 Posts

1326

January 15th, 2019 02:00

R7425 Random OS Restarts (Epyc chipset)

We have an 8 node cluster running Server 2016 Data center configured for Hyper V that has been in place for the past year. During this time we have experienced random OS reboots across all nodes.
 
Idrac simply reports an OEM S event and event logs an unexpected shutdown.
 
Based on other user threads of a similar nature the following changes have been made to system profile but to no avail.
 
CPU:                Max performance
Memory Frequency:    Max Performance
C1E:                Disabled
C States:            Disabled
 
Server drivers and firmware are up to date.
Dell have analysed our logs and have been unable to identify any hardware issues.
 
Hyper V is integral to our business and we are desperate to resolve and stabilise, to the point we are considering buying new hardware and rebuilding.
 
Any suggestions would be gratefully received.

4 Operator

 • 

2.9K Posts

May 7th, 2019 12:00

With it affecting all 8 nodes, I wouldn't expect it to be a hardware failure. Have you tried running Microsoft's BPA to make sure your design conforms to their recommendations?

No Events found!

Top