After that then reboot and boot to the 32bit diags and run Mp Memory on the server to see if it will identify the failed dimm.
If it does then swap that dimm with another matching dimm in the server and then retest to see if the issue follows the dimm or stays at the motherboard slot.
On August 9 when I got the final error logged, in the Alert log the entry was made:
Memory device status is critical Memory device location: DIMM2 Possible memory module event cause:Single bit warning error rate exceeded,Single bit failure error rate exceeded
I performed the first three steps successfully. Before clearing the HW log in OMSA I noted that there were two messages with ! and then a message with X (I guess alerts then error) on Aug 2, 8, and 9 that said:
Persistent correctable memory error rate has increased for a memory device at location DIMM2.
This seems to identify the DIMM location (as they are DIMM 1 - DIMM 12 when I look at the memory array. HOWEVER, looking at the memory array in the OMSA DIMMs 1-8 show 2048MB and DIMMs 9-12 show 1024MB. Why would the system reconfigure the last four like this is the problem was with DIMM2?
"why a 32-bit diagnostic media when I am running 64-bit on this system"
32-bit Diagnostics is a bootable 32-bit application. 64-bit machines (except Itanium) are capable of running 32-bit or 64-bit software. Since it is a bootable application, the OS you have installed is irrelevant, as it interacts directly with the hardware without booting your OS. They could make a 64-bit Diagnostics, but there would be very little benefit from doing so.
The dimm is probably causing issues within its channel, which will effect dimm 6 as well as 10. Swap dimm 2 with dimm 4 and see if the issue moves or stays with the current location. If it stays it may be the motherboard slot is failing.
The other explanation would be that somehow memory Sparing was enabled in the BIOS, that would cause a 20gb/4gb split as well.
DELL-Chris H
Moderator
•
9.7K Posts
0
August 23rd, 2012 07:00
Doug,
What you need to do is this;
Open a cmd prompt to the root of c:
Type dir /s dcicfg32.exe to locate the utility.
Go to that directory and run dcicfg32 command=clearmemfailures
After that then go to OMSA and then go to the HW log and then clear the log.
Download and run this app to create 32bit Diagnostic media - www.dell.com/.../poweredge-2900
After that then reboot and boot to the 32bit diags and run Mp Memory on the server to see if it will identify the failed dimm.
If it does then swap that dimm with another matching dimm in the server and then retest to see if the issue follows the dimm or stays at the motherboard slot.
Let me know how it goes.
Chaplain Doug
1 Rookie
•
89 Posts
0
August 23rd, 2012 08:00
On August 9 when I got the final error logged, in the Alert log the entry was made:
Memory device status is critical Memory device location: DIMM2 Possible memory module event cause:Single bit warning error rate exceeded,Single bit failure error rate exceeded
Chaplain Doug
1 Rookie
•
89 Posts
0
August 23rd, 2012 08:00
I performed the first three steps successfully. Before clearing the HW log in OMSA I noted that there were two messages with ! and then a message with X (I guess alerts then error) on Aug 2, 8, and 9 that said:
Persistent correctable memory error rate has increased for a memory device at location DIMM2.
This seems to identify the DIMM location (as they are DIMM 1 - DIMM 12 when I look at the memory array. HOWEVER, looking at the memory array in the OMSA DIMMs 1-8 show 2048MB and DIMMs 9-12 show 1024MB. Why would the system reconfigure the last four like this is the problem was with DIMM2?
Chaplain Doug
1 Rookie
•
89 Posts
0
August 23rd, 2012 08:00
Also, why a 32-bit diagnostic media when I am running 64-bit on this system?
theflash1932
9 Legend
•
16.3K Posts
0
August 23rd, 2012 08:00
"why a 32-bit diagnostic media when I am running 64-bit on this system"
32-bit Diagnostics is a bootable 32-bit application. 64-bit machines (except Itanium) are capable of running 32-bit or 64-bit software. Since it is a bootable application, the OS you have installed is irrelevant, as it interacts directly with the hardware without booting your OS. They could make a 64-bit Diagnostics, but there would be very little benefit from doing so.
DELL-Chris H
Moderator
•
9.7K Posts
0
August 23rd, 2012 09:00
The dimm is probably causing issues within its channel, which will effect dimm 6 as well as 10. Swap dimm 2 with dimm 4 and see if the issue moves or stays with the current location. If it stays it may be the motherboard slot is failing.
The other explanation would be that somehow memory Sparing was enabled in the BIOS, that would cause a 20gb/4gb split as well.
Chaplain Doug
1 Rookie
•
89 Posts
0
August 23rd, 2012 09:00
When I get in the box, will it be easy to determine which slot is DIMM 2? Is it marked?
DELL-Chris H
Moderator
•
9.7K Posts
0
August 23rd, 2012 10:00
Dimm #2 is the fourth dimm slot (2nd white slot) from the center of the board. Dimm 4 is the 3rd from the last (or last white slot)