Desktops General - Read Only

Last reply by 11-07-2016 Unsolved
Start a Discussion
2 Bronze
2 Bronze
6354

T7500 Riser Card broken ?

Hi,

I have a T7500 server (or is it considered still a workstation?) which stopped one day with  "uncorrectable memory error in Riser DIMM 3". I tried reseating the memory without success (same error) and after a few changes to the memory ( it has 12x 4GB modules from Hynix 2Rx8 PC3L-10600R) it turns out it runs just fine when I remove all Riser Memory and only keep 6 Motherboard modules. My understanding is that this is not such a great memory configuration and so this morning after an overnight job which ran just fine, I removed DIMM4, 5 and 6 from the MB and moved it to Riser DIMM 1 2 and 3 which I believe is an acceptable configuration. A crash happened after about 10 min.with "Uncorrectable memory error has been previously detected in RISER DIMM 3" - This means that since the memory chip came from the MB that previously worked well overnight, it is not the memory that is broken but the RISER Card perhaps ? Is this right ? If so, can I plug a new one right in after I transfer the memory and the CPU from the old one ? Can this work ?

BTW, when this error occurs, the system offers me to hit F5 for tests: the memory tests run for a while until they freeze without any additional error message. At this point I can do nothing but reboot the hard way.

Thanks,

Lothar

Replies (17)
10 Diamond
3295

has been previously detected in RISER DIMM 3

I'd interpret that to mean it's referring to the error before you swapped the RAM around, and the system may be "afraid" to use any module in riser DIMM slot 3.

You might want to clear the error logs. Not exactly sure how that's done on this model, but maybe by clearing BIOS:

  1. Reboot and press F2 to open BIOS setup
  2. Copy down all current settings
  3. Power off and unplug
  4. Press/hold power button for ~15 sec
  5. Open case and remove motherboard battery
  6. Press/hold power button for ~30 sec
  7. Install RAM with 12 GB on motherboard and 12 GB on riser, in slots 1, 2, 3 on each
  8. Reinstall battery
  9. Close case and reboot
  10. If you get any error messages at boot, open BIOS setup and make sure all settings match what you copied

I'd try that before replacing the riser card. And if that solves the problem, you can order replacement RAM, if you need it. Hope you know which module was in riser 3!  And you may have to buy more than one module so things match...

Ron

   Forum Member since 2004
   I'm not a Dell employee

2 Bronze
2 Bronze
3295

Hi,

given the short space that I gave me to pose my question combined with a certain awkwardness in the way I express problems there are a few things I need to correct.

This workstation came from our organization's surplus (so I did not put in the original order) but it worked brilliantly running CentOS 6.7 for well over a year. One day, running a hefty parallel job, it crashed (stopped dead) When I rebooted it it stopped before booting with the message "An uncorrectable memory error had occurred in Riser DIMM 3" - I write this from memory. At that time it had 12 4GB Hynx Memory modules installed 6 on the MB and obviously 6 on the Riser. Every time I tried to replace this chip and run the very same job that crashed the machine before, I ended up getting the same message about faulty memory in Riser DIMM3. Curiously,  I can still run it in the original configuration with a bunch of single threaded jobs without any problems. Also, one of the first things I did was to run Memtest86 and it ran and ran without a problem whatsoever - again with the 'bad' memory installed. So I was puzzled as to what is going on. After a while I tried different recommened and non-standard memory configuration with mixed success. What worked with the program that crashed the machine every time was to remove *all* Riser Memory and keep 24 GB on MB. This is *not* a recommended configuration as it is proably very slow compared to what it could do with 12GB + 12 GB distributed over MB and Riser. Maybe so. Slow but works. When I setup the 12+12 configuration it crashes with the above message about Riser DIMM 3. This time I know the memory chip came from a configuration that definitely worked. So I have no reason to believe it is broken.

So one question is, why did Memtest86 fail to detect this apparently bad memory chip or bad memory access?

Before I thought of the Riser board as the culprit, I strongly suspected the memory modules. I had decided to buy new 8GB modules (6) and put them into DIMM 1 2 3 of both MB and Riser as this is a recommended memory configuration. Now with he observation that 6 x 4GB modules in this configuration still don't seem to work, I am in doubt.

BTW the BIOS seem to be very good at detecting changes and does not dwell on old error states. When I change the system memory configuration it tells me that if found changes but this is okay - it normally boots and runs. The job that crashes the machine typically runs for a few minutes presumably until it uses all 16 threads and then crashes.

Is there anything else I can try? One more thing, it seems that refurbished RISER boards H236F are not that expensive (anymore). Is getting a refubished one a bad idea ?


Thanks.

Lothar

10 Diamond
3295

I'd still try clearing BIOS before buying another riser. Clearing BIOS is  easy and it's free.

I guess it could be a slot failure on the riser. Have you tried using canned air to blow out any dirt that might have accumulated in that slot?

Only you can decide how next to proceed...

Ron

   Forum Member since 2004
   I'm not a Dell employee

2 Bronze
2 Bronze
3295

Hi,

can you give me a hint as to which settings I have to write down? The only thing that I remember I did was the boot device order the rest I did not touch. 

Yes I did use compressed air to clean the sockets and the rest of the fans / heat sinks. No change in the end.

I do appreciate your time and advice!


Lothar

10 Diamond
3295

Copy all BIOS settings. I have no way to know which ones have been changed to something other than the defaults. And when you clear BIOS all of them will be reset to the defaults.

You might be able to take digital photos of the BIOS screens instead of writing the settings down, but make sure the flash is off, and the photos are readable before you clear BIOS.

Ron

   Forum Member since 2004
   I'm not a Dell employee

10 Diamond
3295

Memory that isnt all from the exact same vendor and speed will have issues.

Very unlikely the RISER is bad and not the RAM.

If you have RISERS you CANNOT have ANY ram on the motherboard.

This is not a valid configuration.

 


Report Unresolved Customer Service Issues
here

I do not work for Dell. I too am a user.

The forum is primarily user to user, with Dell employees moderating
Contact USA Technical Support


Get Support on Twitter @DellCaresPro


Diagnostics & Tools

10 Diamond
3295

If you have RISERS you CANNOT have ANY ram on the motherboard.

This is not a valid configuration.

Not correct....

Here's the table from the T7500 manual showing RAM configs on motherboard and riser for dual CPU systems:

You originally had 48 GB (12x4 GB) with all slots filled on both motherboard and riser. After removing RAM from the riser and redistributing the remaining 6x4 GB, you have 3 slots filed on each. Both are acceptable configs.

When you bought replacement RAM, was it Dual Rank (DR) ? Single Rank (SR) RAM won't work.  

Ron

   Forum Member since 2004
   I'm not a Dell employee

10 Diamond
3299

The chart Is not allowing for risers and ram on the motherboard.

I am correct. 

We do need to differentiate T7400 from T7500.

There is no RISER #3 on a T7500.  The T7500 max ram is 192 Gig where the additional 6 slots are on the 2nd CPU Riser Card. Ram is not Dual Channel its Tri channel on the T7500 aka the white ears get ram first.  1 2 and 3 are not physically in order.  The banks are codified by the White EARS being 1st.



16 Sockets (4 Sockets (2 banks of 2) per Riser)  16 X 8 =128 MAX

8 Sockets (4 banks of 2) (standard) YOU CANNOT MIX THESE.

There are 16 slots with the risers installed which eliminates the 5 6 7 8 SLOTS on the motherboard.

The cooling fan for the ram is also different which would cause issues.

The risers are not ronco set and forget and have power cables as well as daughterboards.




 


 

 


Report Unresolved Customer Service Issues
here

I do not work for Dell. I too am a user.

The forum is primarily user to user, with Dell employees moderating
Contact USA Technical Support


Get Support on Twitter @DellCaresPro


Diagnostics & Tools

2 Bronze
2 Bronze
3299

Hi,

thanks to all of the rockstar experts who help to debug this system. It seems that we ran into questions about the proper configuration of the system. It is a T7500 and it has a RISER I believe H236F. It does not interfere with the mainboard memory as you suggested. Maybe this is the case in one of the other systems ?

I had initially 8 Sockets (white and black tab) on the Mother board filled with 4GB modules and the Riser CPU/Memory board (your 2nd picture is correct) filled with 6x 4GB modules (same type and specs). The last 3 pictures are unfamiliar to me and depict hardware that is not part of this configuration.

Currently I did some test: I flushed the BIOS as suggested by the first reply I got, reset all the necessary paramters in the BIOS (time and boot order, asset tag) and had 6 modules (4GB) in Memory: 3 on the MB and 3 in the Riser in each of the DIMM 1 2 and 3 position. I believe this is an acceptable mem. configuration.

All jobs I can throw at it run well, except the one that crashed it the first time. This is surprising that it is just this one that creates the problem. Odd.

As the last test I ran the said job and it froze. After restart, it stops saying " Alert! Uncorrectable Memory Error has been previously detected in RISER DIMM 3" It offers F1 F2 and F5. F5 runs diagnostics and when I do this it freezes at some point during the memory checks. No error message nothing.

To me all this points more to a RISER DIMM3 internal hardware error. Since refurbished Riser boards are on the order of  $ 120 it seems to me that I could replace the board without breaking the bank. Are there any concerns ? Is it difficult to move the CPU ?

Thanks, Lothar

Latest Solutions
Top Contributor