I walked into our server room and saw that one of our ESXi Servers crashed. The screen showed
e2119 fatal sb mem crc error...(screen attached) ,, I rebooted the server and it is running,,
does this screen indicate issues with a MEM cards and CPU ,,if so, why was the server able to boot and is now running?
Solved! Go to Solution.
I updated the BIOS by building a bootable USB thumb drive and copying the BIOS files to it, then booted the server and ran the exe's
I powered down and re-seeded all the cards and rebooted,
The error disappeared
Not sure how or why this occurred of if it will occur again,
I don't thin an application would have caused this?
The error is stating there is a Single Bit error causing the Cyclic Redundancy Check to fail. Lets verify some things to eliminate some other possibilities. To start, what is the part number on the dimms you have installed in the server? Also, what is the BIOS and ESM/Drac revisions currently at? Are they up to date and current? I ask as the BIOS being really out of date can cause this issue as well and I wanted to be sure to eliminate those variables.
Let me know.
I have not installed new RAM. This is the original RAM that came with the server when ordered. So, I do not know the serial number. The service tag may reveal the RAM part number
The BIOS (v2.3.1) has not been updated for several years but if that is than why hasn't it occurred earlier than now. We have 5 of the same server and some are at the same BIOS and have not generated this error.
If you would, could you Private Message me the svc tag please, so that I may verify the dimms installed?
Also, your BIOS is behind and also prior to a Urgent flagged BIOS update. With the other servers there are multiple variables that can cause them not to error. Even with the hardware installed. So I couldn't tell you why this specific 2950 errored when another didn't. Keeping the servers current is the way to minimize this happening on them though. The BIOS is update to correct issues reported and discovered, so when you keep them behind they are more susceptible to errors than updated systems are.
Thanks. Let me know the svc tag.
Thank you for the PM. Everything checks out. Would you
BIOS 2.7.0 (Current) -
Walk them up and once updated then reseat the dimms and see if error remains. If so then remove all the dimms except Slots 1 and 2 and then test the dimms in pairs in 1 and 2 and see if error repeats on any of the dimms.
So you can walk them up to current. When the controller is multiple updates back it is a good practice to walk them up and not just jump to current.
Thats correct, with ESXi there isn't currently a upgrade package to run withn the OS. The update will have to be run outside of the OS. You can boot to the 32bit Diags and then exit to the command prompt and from there run the Unpackaged version of the updates above to update the BIOS. You can find the 32bit diag link below.