Start a Conversation

Unsolved

S

5 Posts

9495

January 17th, 2019 02:00

MEMBIST failure after BIOS update

Hi all, we have a problem with a old (but good) r410, we changed the CPU with 2 Xeon E5606 and upgrade the ram with 8 banks HYNIX 8GB 2Rx4 PC3L-10600R (registered ECC). After the hardware update since the first boot, we had the error in the Subject: "membist failure the following dimm has been disabled by bios" After that, we started to update at the latest version available, for example: Life Cycle, Idrac, Perc, OS Drivers, Diagnostic, BIOS. Without results. Any suggestion?

Moderator

 • 

8.8K Posts

January 17th, 2019 05:00

SorreSte,

Would you clarify the part number for the dimms being used, as well as if the error states a specific dimm #? If it does, and the aforementioned dimm is swapped with another matching dimm in the server, does the error change at all?

Let me know. 

5 Posts

January 17th, 2019 06:00

Hi Chris, thank you for the reply, if i read well the specs of the DIMMS, are all 99L0382-001.

Unfortunately the error is the same with all the available DIMMs configuration.

5 Posts

January 18th, 2019 01:00

I found the article below 

 

https://patrickhoban.wordpress.com/2011/03/25/2312/

 

how i can change the Memory Option from Optimized to Advanced ECC? The bios on my r410 can manage this option?

3 Posts

April 6th, 2019 02:00

F2  into the BIOS and choose the memory options,  and then space bar to choose

14 Posts

August 21st, 2020 18:00

I've run into this situation on 2 different systems -- a T610 and a Dell Precision T7500.  I even went so far as to try to replace the motherboard in the T610 -- same error that cuts my memory from 192GB to 160GB (2 banks of 16G are disabled).  The error message also gives a hint about the cause being config related by telling to go look at the User documentation (Guide or Manual) for valid memory configurations.  I ordered the system with ECC registered ram from day one and have only upgraded memory to the same type, however, I notice now that it has fallen into "optimizer mode" which I have seen else "somewhere" as possibly being synonymous with a slightly lowered capacity.  The only  memory settings beside optimizer is interleaved which lowers performance over local memory being used on a local processor in a NUMA configuration.

I noticed that another memory setting has to do with voltage: 1.5V or auto.  The memory present in the computer is LV (1.35V, I believe).  At this point over the course of this problem, the CPU's have been replaced and switch for testing, memory chips have been switched both w/r/t what CPU they were being used with as well as positionally relative to their local CPU. 

 

Right now, this seems very similar to a problem I had when I updated the iDRAC ROM.  Early on I had ordered this system with two  570W PSU's.  Max wattage of the system had been about 285W (as measured by on-board sensors read w/IPMI).  Later I added a 2nd processor I ordered from Dell's parts department and had the power run as high as 325W. 

That worked for about a year until I upgraded the iDrac ROM.  It proceeded to force the system into a low-performance mode -- with the system memory running at 800MHz instead of a more normal 1333MHz and the CPU being placed into a low power mode.

I called support who told me the 2 cpu configuration required me to upgrade my PSU's from 570W to 8-something (think 840W), at my cost.  I wasn't pleased, since daily (by the minute) power reports from the internal sensors showed a max usage of 325W, with regular usage being lower.  Sample sensor output as recorded by my script(s):

200822-004543:P1,2⇒1.0×112,0.9×112=231W,F1-4(㎐)⇒34,34,32,32;TAmb⇒25℃/77℉);200822-004612:P1,2⇒1.0×112,0.9×112=231W,F1-4(㎐)⇒38,38,38,38;TAmb⇒25℃/77℉);

showing Amps, + Voltage for each PSU + system power usage in Watts (usually higher), fan speeds in Hz (while relayed in rpms, they only change by multiples of the Hz measurement) and the ambient temp at 30s intervals.  Other temperatures have come and gone but the power, fan speeds and ambient T have been consistently present.

The point being that by measured values, max had been 325, no where near the 570W rating of the 'eco-PSUs' let alone showing a need for the 840W PSU.  It also wasn't possible to reinstall the lower version of the iDrac ROM, so I had to get the 840W PSU's which average about 25W more usage, constantly, compared to the 570W PSU's.

It was then I learned that ROM upgrades might enforce more rigid conditions on your system as well as disabling system functionality.  This had made me more than a little reluctant to upgrade any ROMs in the system as it is similar to MS practice of their updates removing OS functionality (on purpose, not the accidental disabling of Win10 user systems via the many blue screens Win10 users get).

----

I have another area of odd symptoms when trying to install SSD's to replace some of my internal, rotating disks, like orange blinking lights, among other symptoms, even though I was able to perform testing on them which gave no discernible problem other than the orange blinking light.  The integrated adapter a PERC 6i, shows with the megaraid util that both SSD Guard and SSD Disk Caching are both enabled in the card, so not sure why I'm getting the orange blinking light.

 

So how can I get my BIOS and/or iDrac to allow setting the memory mode to ECC that should get rid of the MEMBIST problem, and, for that matter, allowing SSD Disks to be used by the Perc controller? I'm thinking answer for the T610 might be applicable to the T7500, as they are the same generation of parts.

Thanks!

 

Moderator

 • 

4.1K Posts

August 24th, 2020 00:00

Hi,

 

-MEMBIST failure after BIOS update

 

  1. You have the customer install the latest BIOS for their platform.  Major changes to memory timing and Intel Memory Reference Code contained in this BIOS help to eliminate failures.  In addition, this BIOS will allow us to find DIMMs that need to be replaced.  In short, the latest BIOS can turn a system that might intermittently not complete POST, into a system that will correctly “flag” a specific failing DIMM.
  2. Update to the latest BIOS and follow normal memory troubleshooting methods, starting with reseating the faulted DIMMs.  If failure returns to that DIMM/slot, please swap the DIMM with another DIMM within the system.  If failure follows the DIMM, replace that affected DIMM or DIMMs per the customer’s warranty. 

-PERC 6/I + SSD have to orange blinking light

We need to see the TTY logs for Controller logs. Here’s how to.

https://dell.to/2YpVGyX

Non Dell SSD can always cause orange blinking light on the log due to the certification error. Before make sure your firmware is up to date before all.

 

Please check out page 119 for memory configuration.

https://dell.to/2Yuu6AS

 

11G Systems will display a failure message on unused memory from slots that should not be populated for Mirror or Advanced ECC Memory Mode. The following messages are displayed when the wrong slots are populated for either of these two memory modes-Memory Initialization Warning: Memory size may be reduced.

December 28th, 2022 23:00

Changed the CPU for another one and everything started working.

No Events found!

Top