Unsolved

4 Posts

1270

October 13th, 2021 12:00

R720 Replacing bad memory stick

We have an R720 with a single CPU (E5-2697 v2) and 8x16GB memory sticks.  One of the sticks is failing. It says "Single-bit failure error rate exceeded". So we need to replace it but can't find any 16GB memory that matches and is readily available.  So we're thinking of adding 8GB memory.  

Can we fill 6 slots with 8GB and 6 slots with 16GB sticks?  I'm not seeing that as one of the listed configurations in the manual

Thanks in advance for your help, 

John

6 Operator

 • 

3K Posts

October 13th, 2021 18:00

You can mix memory of different size. Please ensure you are following various guideline mentioned in below link 

https://www.dell.com/support/manuals/en-us/poweredge-r720/720720xdom/general-memory-module-installation-guidelines?guid=guid-0f97a40c-2a63-4dc8-a8b2-607006d75804&lang=en-us 

Below link have a specific combination where you can install eight 16GB and three 8GB DIMM. I believe the combination mentioned by you will also work. You can give a try

144 16 and 8 10

2R, x4, 1333 MT/s

A1, A2, A3, A4, A5, A6, A7, A8, A9, A11

https://www.dell.com/support/manuals/en-us/poweredge-r720/720720xdom/sample-memory-configurations?guid=guid-afe58651-3836-4641-8d7b-463afac05ea6&lang=en-us 

Moderator

 • 

5.4K Posts

 • 

37 Points

October 13th, 2021 18:00

Shine's right. You technically can. Would it give you the optimal performance? That we could have reservation on. Have a good one!

4 Posts

October 16th, 2021 10:00

Thanks Shine and Young.  We found some 16GB memory and it should arrive in the next few days. So I think we'll wait and see if the production server fails before trying the 6x16GB plus 6x8GB configuration.

4 Posts

October 25th, 2021 12:00

We replaced the bad memory stick today but the openmanage server is still saying it's faulty.  We have the same error we had with the original stick.  It says "Single-bit failure error rate exceeded" for DIMM_A5. 

The new memory stick snapped right in there. It couldn't have been smoother. When the server booted up, it recognized the new memory, or at least that there was change to the memory, and it gave a message about optimizing and restarting again. It did that. The bios said there was 128 GB and we have 8x16GB so it looked good. Also, in the performance monitor it shows 128 GB now.  

Any suggestions?  

I'm attaching some screen shots, first the task manager showing 128, then the open manage screen for the bad stick, then an open manage screen for a good stick (so you can confirm it's the same memory specs). 

Task manager showing 128 GBTask manager showing 128 GBOpenmanage Bad StickOpenmanage Bad StickOpenmanage Good StickOpenmanage Good Stick

Moderator

 • 

4.7K Posts

 • 

25.5K Points

October 25th, 2021 13:00

Hello beauars21,

 

Have you checked you are on latest BIOS 2.9.0?

 

Does the DIMM meat the General Memory Module Installation Guidelines that ShineK posted?

https://dell.to/2ZldLS8

 

 

If you clear the System Event Log (SEL) in the DRAC does that clear the Single Bit error?

 

I'd also recommend running the built in hardware diagnostics

 

Boot to  F11 on Dell Splash screen, selecting  Boot Manager -> System Utilities -> Launch Dell Diagnostics.  Note any messages and continue testing.

 

 

If you still have the SBE error, split the DIMMs in A1 and A5 to different slots:

Can you confirm you have slots  A1, A5, A2, A6, A3, A7 and A4, A8 populated?

Swap:

A1 with A2

A5 with A8

Clear the SEL log and run diags again to check results.

 

Memory slot numbers:

https://dell.to/3vHuVpo

 

4 Posts

October 25th, 2021 14:00

We are on bios version 2.2 (from 2014) and smbios version 2.7.  We'll get that updated. 

Yes, our memory setup meets the guidelines. 

The same error is appearing in the event viewer with a timestamp right after we booted up. I assume that's where the open manage is getting its info? 

It's a production server so we don't have much time to take it down and test stuff but we'll try your plan to swap sticks in A1 and A5 with A2 and A8.  Also, we bought 2 replacement DIMMs but only put one in.  Should we use the 2nd new DIMM too? 

Moderator

 • 

4.2K Posts

 • 

20.9K Points

October 25th, 2021 19:00

Hi @beauars21,

 

OpenManage gets it's event logs from both OS and the LifeCycle Controller. Try clearing the logs in iDRAC/LCC. Majority of memory error will be cleared after BIOS update, unless it's a mainboard slot issue. After you have updated the BIOS and cleared the logs, try swapping the memory as suggested and check if the issue persist. Sometimes, drain power can too help clear the error, try - Hard Reset https://dell.to/3pCzYqb

 

Leave the new replacement DIMM as a spare since you have 2, 1 is already installed. 

 

Ultimately, if you have done most of the troubleshooting as the suggested above and you still have the error, you might a mainboard replacement since this is a production server and would not want any disruption.

0 events found

No Events found!

Top