Unsolved

This post is more than 5 years old

13 Posts

480374

July 1st, 2013 19:00

R710 memory error after BIOS upgrade

R710 with 18 4GB sticks

So 72GB total

Was running fine for years

Updated SAS, drac, and BIOS firmwares

BIOS went from 2.x.x to lates 6.3.0

Now on boot I get configuring memory and

MEMTEST lane failure detected on DIMM B8
MEMBIST failure - The following DIMM has been disabled
by BIOS - DIMM B8

All in all after BIOS update I lost four DIMMs - B7, B8, B9, A8, A9

No memory errors has appeared on system window or log in DRAC prior to update

13 Posts

July 1st, 2013 19:00

BIOS memory is set to optimized mode, did go into BIOS and check the setting had not changed and also did a remote DRAC power chassis off, power chassis on

3 Apprentice

 • 

1.1K Posts

July 2nd, 2013 10:00

are you able to move the memory modules around to see if the issue follows the module or stays with the memory slot?

13 Posts

July 2nd, 2013 12:00

Not at the moment. That probably would have been step #1 but as it happened right after a BIOS update thought I'd post here to see if known bug or not.

I think step one might just be go to BIOS, switch to Advanced ECC, power cycle, back to Optimized, power cycle, and see if it tickles a bug.

Will eventually work on that swapping of RAM just requires a little work ... live clients on that thing.

4 Operator

 • 

9.3K Posts

July 2nd, 2013 19:00

One possibility is that the older bios' memory testing wasn't quite as good and the new bios sees the issue that was lingering there already.

13 Posts

July 8th, 2013 15:00

Well this is interesting.

I came in to go swap more memory around .. this time I smacked "esc" on the initial BIOS mem testing page and it didn't disable any slots

So I thought what the heck, with all slots enabled I will let it boot from the Dell Diag disk and started running the Dell mpmemory diag - running against all 72GB

it's got through the first five tests (evlog, stress, integ1, integ2, WCMATS) without a single reported error and just started running WCMch

13 Posts

July 8th, 2013 18:00

Okay, after finally getting it down to one DIMM when moved causing the error to move, I then started swapping DIMM's around that one DIMM

Now I get no errors on boot. Did a power cycle a few times and let it go through BIOS mem check

No errors or disabled slots.

Now going to let MpMemory diag loop all night long ... phew

Still think there is something about this BIOS update being overly picky.

13 Posts

July 9th, 2013 10:00

Been through almost 13 passes with no errors in mem diag

13 Posts

July 15th, 2013 15:00

Okay I let it run for days on end (since my last post) running memdiag in loop mode

ran for like seven days testing all 72GB with no error.

Power cycle, let do full config, then pops up

"lane failure A9, a7/a8/a9 should have same size dimms"

and disabled memory on me

Ugh.

1 Rookie

 • 

8 Posts

August 6th, 2013 16:00

Same here on a R610... but with just one failed DIMM (one 8GB stick

out of total 48GB).
Since our rig was fine before as well and just the BIOS update to 6.3.0

somehow 'broke' it - I'd too suspect the new BIOS to be a bit flaky.

(looking at the release notes they even claim to have changed sth with regards

to the memory reference code)

But on the other hand - it's quite uncommon that possible problems wouldn't

have been noticed by someone else before (since the BIOS 6.3.0 release date

seems to be 2012-08-15).

Is there any progress with regards to this issue on your side?

TIA!

No Events found!

Top