1 Copper

R7425 Random OS Restarts (Epyc chipset)

We have an 8 node cluster running Server 2016 Data center configured for Hyper V that has been in place for the past year. During this time we have experienced random OS reboots across all nodes.
Idrac simply reports an OEM S event and event logs an unexpected shutdown.
Based on other user threads of a similar nature the following changes have been made to system profile but to no avail.
CPU:                Max performance
Memory Frequency:    Max Performance
C1E:                Disabled
C States:            Disabled
Server drivers and firmware are up to date.
Dell have analysed our logs and have been unable to identify any hardware issues.
Hyper V is integral to our business and we are desperate to resolve and stabilise, to the point we are considering buying new hardware and rebuilding.
Any suggestions would be gratefully received.

0 Kudos
1 Reply

Re: R7425 Random OS Restarts (Epyc chipset)

With it affecting all 8 nodes, I wouldn't expect it to be a hardware failure. Have you tried running Microsoft's BPA to make sure your design conforms to their recommendations?

0 Kudos