Start a Conversation

Unsolved

D

31 Posts

3334

March 17th, 2021 13:00

Problems with RAM in Dell 7920 - I suspect a motherboard issue

DIMMs are present in socket 1 to 4 of the second CPU. Filling sockets 1-4 on the first CPU results in a non-working computer.DIMMs are present in socket 1 to 4 of the second CPU. Filling sockets 1-4 on the first CPU results in a non-working computer.I have a Dell 7920 which came with a Silver 4110 8-core CPU, and a pair of 8 GB RDIMMs. I wanted to expand the memory, but Dell wants a huge amount of money for RAM. It is about 3 x the cost of RAM from Kingston.

Since these CPUs have 6 memory channels, the manual says best performance is achieved with memory in sets of 6 DIMMS - so either 6, 12 or 24.

I have 7 pieces of Kingston KTD-PE429/32G 2933 MHz RDIMMs. All were bought new. They cost me more than the Dell workstation did! I'm fully aware that 7 DIMMs is not a supported configuration, but that is not my problem.

My basic problem is that if I put an RDIMM in two of the 12 sockets of CPU0, the machine will not power on. The power button flashes amber and white, in the sequence "2 amber blinks followed by a short pause, 4 white blinks,long pause, then repeats" which is documented in the manual as "Memory/RAM failure". The problem is with DIMM socket 4, and one of either 5 or 6 (I can't recall which). So as I populate the system board I find

* DIMM 1 occupied - 32 GB seen, Computer works.

* DIMM 1 and 2 occupied - 64 GB seen. Computer works.

* DIMM 1, 2 and 3 occupied - 96 GB seen. Computer works

* DIMM 1, 2, 3, and 4 occupied - the machine will not power on.

* DIMM 1, 2, 3, 4 and 5 occupied - the machine will not power on

* DIMM 1, 2, 3, 4, 5 and 6 occupied - the machine will not power on

According to the manual, RAM sockets with white tabs (1-6) should be populated first. But filling all the white sockets does not work. If I fill a mixture of black and white sockets I can get all DIMMs to work, but since I can't use more than 4 memory channels, the RAM performance is slow. 

Apart from the motherboard, the fault could be with a CPU of course. But I don't think that's the case, as I have a pair of Intel Xeon Platinum 8167M CPUs.  Exactly the same issue occurs whether I have

* Original Silver 4110 CPU (8-core, 2.1 GHz)

* 1 x 8167M CPU (24-core, 2.0 GHz)

* 2 x 8167M CPUs

The photograph shows the screen when the BIOS is used to inspect the system. Note here that DIMM sockets 1, 2, 3 and 4 are filled for the 2nd CPU, but only 1, 2 and 3 for the first. Cleary the system is seeing all 7 DIMMs, but when I install them in the correct locations for CPU0, the machine will not function.

I have done a couple of other tests

* Updated firmware to the latest

* I've run the RAM tests using the Dell firmware. They do not report any errors.

* Tested both the original Silver 4110 CPU, and the Platinum 8167Ms using a tool I downloaded from the Intel website. That finds no problems with any of the CPUs at all.

* I have also run a stress test using a program for testing prime numbers.

I've come to the conclusion that the system board is probably faulty, but unfortunately one needs 4 DIMMs to demonstrate that, and I don't have 4 DIMMs from Dell - only Kingston equivalents.

What are my options?

 

 

 

 

 

4 Operator

 • 

1.1K Posts

March 17th, 2021 13:00

From the 7920 Technical guidebook

 

7920 TOWER PROCESSORS—INTEL XEON SKYLAKE SCALABLE PROCESSOR FAMILY - SP

Note: Global Standard Products (GSP) are a subset of Dell’s relationship products that are managed for availability and synchronized transitions on a worldwide basis. They ensure the same platform is available for purchase globally. This allows customers to reduce the number of configurations managed on a worldwide basis, thereby reducing their costs. They also enable companies to implement global IT standards by locking in specific product configurations worldwide. The following GSP processors identified below will be made available to Dell customers.

Note: Processor numbers are not a measure of performance. Processor availability subject to change and may vary by region/ country.

• 2666MHz DDR4 ECC RDIMM/LRDIMM memory will scale down to 2400MHz with Xeon Gold 51XX Series (excluding 5122) and Xeon Silver 41XX Series and down to 2133MHz with Xeon Bronze 31XX Series Processors. • 2933MHz DDR4 ECC RDIMM memory is not supported with Xeon Skylake SP processors

 

I'm not sure, but maybe a proper test would be with 2666Mhz dimms ?

4 Operator

 • 

1.1K Posts

March 17th, 2021 13:00

From the Technical Guidebook

 

7920 TOWER PROCESSORS—INTEL XEON SKYLAKE SCALABLE PROCESSOR FAMILY - SP

Note: Global Standard Products (GSP) are a subset of Dell’s relationship products that are managed for availability and synchronized transitions on a worldwide basis. They ensure the same platform is available for purchase globally. This allows customers to reduce the number of configurations managed on a worldwide basis, thereby reducing their costs. They also enable companies to implement global IT standards by locking in specific product configurations worldwide. The following GSP processors identified below will be made available to Dell customers.

Note: Processor numbers are not a measure of performance. Processor availability subject to change and may vary by region/ country.

• 2666MHz DDR4 ECC RDIMM/LRDIMM memory will scale down to 2400MHz with Xeon Gold 51XX Series (excluding 5122) and Xeon Silver 41XX Series and down to 2133MHz with Xeon Bronze 31XX Series Processors. • 2933MHz DDR4 ECC RDIMM memory is not supported with Xeon Skylake SP processors

 

So, maybe try with 2666Mhz ram ? 

202 Posts

March 17th, 2021 19:00

>t Dell wants a huge amount of money for RAM.

For a reason. Dell holds stocks or manufacturing contracts on certified/compatible RAM for 24h/7d support for years ahead.

The ram you insert should be EXACTLY the same model you already have. Even a small differences in P/N may result in different bank organization, and subsequent failures.

If you can not analyze and find the exact organization - look for exactly the same P/N, which may bring you to DELL store if you are not lucky. But usually it is a big manufacturer like Samsung, Micron or Hynix, who also ships the modules to other big makers, like HP, Apple, or Lenovo - these substitutes may fit in.

31 Posts

March 17th, 2021 22:00

@Andy812  I have completely removed the original 8 GB RDIMMs. All RAM are Kingston modules of exactly the same part number. I am not trying to mix the Dell original RAM with third party RAM. The original RAM modules are 8 GB, but as you can see from the photograph I attached, it has 7 x 32 GB in it.

According to the manual, 6 x 32 GB is a supported configuration with one CPU with one DIMM per channel. Using 6 DIMMs, I can get the machine to run with one CPU, but only if I leave some channels with no memory, and put two 2 DIMMs in two of the channels.

Dave

31 Posts

March 18th, 2021 02:00

@Andy812  I have completely removed the original 8 GB RDIMMs. All RAM are Kingston modules of exactly the same part number. I am not trying to mix the Dell original RAM with third party RAM. The original RAM modules are 8 GB, but as you can see from the photograph I attached, it has 7 x 32 GB in it.

According to the manual, 6 x 32 GB is a supported configuration with one CPU with one DIMM per channel. Using 6 DIMMs, I can get the machine to run with one CPU, but only if I leave two channels with no memory, and put two 2 DIMMs in two other channels. That gives me poor performance, as only 4 of the 6 memory channels are being used.

Dave

14 Posts

March 22nd, 2021 02:00

I don't get what you mean. Lets start from the CPUs. Those can have 1 memory controller(channel), 2 channel, 3 channel and 4 channel.

Usually, every channel supports 2 DIMMs. In the past there were 3-DIMM per channel boards, but it requires too high driving current, so these schemes were abandoned... even in buffered ECC configurations for some reason.

So, 3 channel CPU can get 6 memory modules, not 7.

Full interleaving scheme requires even load for all channels. I.e. 3 or 6 equal modules for 6-channel CPUs, and 4 or 8 modules for 4-channel CPUs.

 

Intel CPUs support uneven load of channels, but interleaving scheme can be degraded.

DELL boards do not allow you choosing the RAM organization scheme, neither they tell you which scheme is used. (SuperMicro boards allow you tweaking such things btw)

 

Now, lets get back to modules. When filling the both slots per channel, you are actualy parallelling not 2 modules, but 4, or 8 ranks, which should be equal. Even when using a buffered/registered RAM, or the controller will fail accessing them.

https://docs.oracle.com/cd/E23411_01/html/E23412/z40001681431678.html#scrolltoc

 

Some boards may fail accessing 2x four-rank DIMMs attached to single channel. So you should check it before buying.

202 Posts

March 22nd, 2021 19:00

P.S.

Seems like yours is 6-channel, so you have 3 options, either 6 or 12 modules for top performance, or any other number for degraded compatibility mode. Dell does not allow you to choose the mode.

31 Posts

March 23rd, 2021 11:00

The CPUs are indeed 6-channel. I can reproduce the problem with the following setup, which would be supported if the RAM was Dell, not Kingston

* One original Intel Silver 4110 CPU supplied with the machine.

* 6 x 32 GB RDIMMs

Other unsupported configurations work or don't work, and in my opinion, strongly indicate the motherboard is faulty. 

One thing to note is that the supported memory configurations with one CPU are only 6 or 12 DIMMs if the DIMMs are 32 GB each. The use of 1, 2, 4 or some other small number of DIMMs is not supported with 32 GB DIMMs.

Things that work with one CPU.

1) 1 x 32 GB - as I wrote above, this is unsupported, but it does actually work.

2) 2 x 32 GB - as I wrote above, this is unsupported, but it does actually work.

3) 3 x 32 GB - as I wrote above, this is unsupported, but it does actually work.

4) 4 x 32 GB - if socket 4 and one other (either 5 or 6) is vacant. I forget which. So I can use 4 memory channels on the original CPU, but only if I use the wrong sockets.

5) Up to 7 DIMMs (all I have) on the one CPU, as long as I put 2 DIMMs on some memory channels and no DIMMs on other memory channels

I don't believe the problem is a CPU since.

6) I have 3 CPUs, but the problem remains independent of what CPU(s) are fitted. (Only one CPU was supplied by Dell - the other two CPUs are Platinum 8167Ms that were not supplied by Dell).

7) I can't get the 4th memory channel on a CPU to work if the CPU is in the socket for the first CPU, but if two CPUs are fitted, the 4th memory channel works on the CPU placed in the 2nd CPU socket. 

The problems I have

7) Dell will not support the system with 6 x 32 GB Kingston DIMMs, but DIMMs from Dell are incredibly expensive.

Dealing with Del for any problem is really frustrating. Long waits on the phone, being transferred from one person to the next in Indian call centres. I have little confidence is Dell to be honest. I expect if one has a common PC or a laptop, the Dell tech support will get you out of problems. But on a complicated workstation like a 7920, the people on the end of the phone line just don't have enough knowledge.

31 Posts

March 23rd, 2021 12:00

Just to make it clear, the Dell 7920 will not work if 6 DIMMs are put in the correct memory slots. So I can't put one DIMM per channel. However, the machine will work if I can put 6, or even 7 DIMMs on the one CPU as long as I put some DIMMs in sockets they should not be in. That sub-optimal configuration gives me two DIMMs on some memory channels and none on two memory channels. Clearly, that's not a good thing to do, but it does actually work.

I have a degree in electrical and electronic engineering and are a chartered engineer. I think I have enough knowledge to ascertain with a very high degree of certainty that this computer has a fault on the motherboard.

I can think of no other logical explanation of why the following occurs

* I can't get the 4th memory channel of a CPU to work if the CPU is in the first slot, but the 4th memory channel works if the CPU is moved to the 2nd CPU

* I can get 6, or even 7 DIMMs (all I have) to work with the first CPU as long as I put them in the wrong sockets.

This all indicates that the problem is not an incompatibility of the Kingston RAM, or a faulty CPU, but a fault on the Dell motherboard. Would anyone from Dell like to comment if my conclusion seems logical or not?

It is cheaper to buy a brand new Dell 7920 from Dell than to buy 6 x 32 GB DIMMs from them.

46 Posts

March 24th, 2021 08:00

A possible solution would be to find a local shop which works on a lot of Dell computers and to pay them to test using Dell-branded memory.  If it replicates w/ Dell-branded memory then you have a MB issue.

31 Posts

March 24th, 2021 12:00

@hammarlund  That's an idea. I just phoned one local PC shop, but they don't have any Dell RAM. The problem is that it's error correcting RDIMMs, which are only really used in servers and high-end workstations, so it's not so easy to find somewhere which has them.

 

I'm thinking of asking Dell what would happen if I purchase 2 x 8 GB RAM modules and they don't work. Would they take the modules back, and accept it is a motherboard fault. A concern about that is that the probable might not be evident with 8 GB memory modules, but does show up with 32 GB memory modules. 32 GB modules are only supported in sets of 6, and buying 6 memory modules from Dell is a huge outlay.

31 Posts

March 27th, 2021 05:00

I am also very unimpressed with Dell. I should have learned my lesson from when the Pentium 1 had a bug on the floating point unit. Dell messed me around something rotten, until a technician finally came out and replaced the CPU. I bought this Dell 7920 as refurbished, but have discovered a motherboard fault. Dealing with incompetent people on call centers just gets on my nerves.

202 Posts

March 31st, 2021 20:00

The problem may be in BIOS, since it does not allow you choosing the interleaving scheme. There are multiple available for XEON multichannel chips; and these schemes are selectable on Supermicro boards. But even if you can choose the scheme, the performance would be degraded with non-uniform channel population.

 

btw, did you check the board error logs?

202 Posts

June 8th, 2021 17:00

sorry, necroposted. please ignore

1 Rookie

 • 

4 Posts

February 7th, 2022 20:00

I don't know how old the post is or if he found the solution. If he would go to dell support and search the 7920 tower in the documentation he would find the memory map/chart that would tell him exactly where to locate to memory modules

 

No Events found!

Top