Start a Conversation

Unsolved

This post is more than 5 years old

92358

September 12th, 2015 09:00

PowerEdge M610 Collect Inventory problem

Hello,

I have a PowerEdge M610 in a M1000e chassis. Recently I upgraded memory and since M1000e didn't show the updated informations I enabled CSIOR in legacy lifecycle controller configuration. Nevertheless I noticed that in hardware inventory randomly some modules are reported incorrectly as 0 MB modules. The version of the IDRAC is 3.75 and the version of CMC is 5.10. At system level the M610 show always the correct amount of total memory. All memory test passed. It seems that something wrong happens during the collection of hardware components.

From the M1000e CMC sometimes the total memory reported changes: 28MB, 16MB or 32MB.

I get a PR11 message entering the lifecycle controller: Message PR11: Part Replacement license is not present, replacement action(s) will not be performed" but the replacement part highlighted in the PR1 I get just before the message PR11 changes: sometimes it talks about DIMM in slot A1, at the next reboot is the slot B5 and so on.

Someone can help me with this issue?

Thanks in Advance,
Davide

16 Posts

September 12th, 2015 17:00

Hello Daniel,

thanks a lot for your fast reply. You understood exactly what's happening.
We have totally 8 x 4GB DIMMs modules. The system is a dual Intel(R) Xeon(R) CPU X5670 @ 2.93GHz so the the 4GB modules are installed in slots: A1, A2, A4, A5 and B1, B2, B4, B5.


I have a doubt, the reseller send us:
- 7x Hynix modules (part number 18KSF51272PDZ1G4D1)
- 1x Samsung module (part number M393B5273CH0-YH9)

We have done the same upgrade on another system using 8x Hynix modules and we didn't have any problem.

What's look strange is that the message from lifecycle controller randomly changes reporting an hardware replacement for a different slot. Do you think that the different DIMM can cause this problem. I noticed that the DIMMs looks very similar (same number of chips, also the disposition of chips on the module is very similar).

Thanks again,
Davide

Moderator

 • 

6.2K Posts

September 12th, 2015 17:00

Hello

Recently I upgraded memory and since M1000e didn't show the updated informations I enabled CSIOR in legacy lifecycle controller configuration.

I'm not sure that I understand what exactly is going on. It sounds like you installed new memory and were having issues with it being properly detected so you enabled inventory collection. You then had issues with inventory collection properly reporting the memory. Is that the situation?

Is this validated memory? If you need assistance then please list the part or model numbers of the old and new DIMMs. Also, list how the DIMMs are populated.

Thanks

Moderator

 • 

6.2K Posts

September 12th, 2015 18:00

- 7x Hynix modules (part number 18KSF51272PDZ1G4D1)
- 1x Samsung module (part number M393B5273CH0-YH9)

I was able to locate the Samsung. It is a validated DIMM. We sell it under part # 9J5WF:

4GB, 1333, dual rank, registered, low voltage

I can't find anything on the Hynix. Can you look for another number on that DIMM so I can look at specs for it? Hynix part numbers usually start with HMTxxxxxxxxxx-xx

Thanks

16 Posts

September 12th, 2015 20:00

Hello Daniel,
I'm sorry, I made a mistake, it isn't Hynix. The memory reported by CMC with part number 18KSF51272PDZ1G4D1 is manufactured by Micron. Unfortunately, I didn't find the specifications about this specific DIMM on internet. Monday I will go to our server farm so I can check physically if I can get any other number on the Micron DIMMs and I will update this thread.
Meanwhile, could you check if in the memory compatibility matrix of M610 there is some Micron DIMM?

Kind regards,
Davide

Moderator

 • 

6.2K Posts

September 13th, 2015 10:00

The memory reported by CMC with part number 18KSF51272PDZ1G4D1 is manufactured by Micron

I found it: MT18KSF51272PDZ-1G4D1

It is a validated DIMM with the same specs as your other DIMM. We sell it under the same part number as the Samsung DIMM, so they will function together.

The DIMMs are populated in the correct slots for independent channel mode. Check the system BIOS to make sure it is in independent channel mode. If it is in advanced ECC or any other mode then the DIMMs will need to be moved.

The issue is likely either the population and mode do not match or there is a faulty DIMM/slot. After you check the mode check the hardware log for any memory errors. If any DIMMs are being disabled due to hardware faults then it can cause an invalid population that disables other DIMMs. You can also view the DIMM status from the iDRAC hardware>memory section.

Thanks

16 Posts

September 13th, 2015 18:00

Hello Daniel,
thanks a lot for this update. I checked the memory configuration in BIOS. The memory mode is set to "Optimizer Mode" that means indipendent channel mode so the population is correct. From IDRAC I can see that sometimes a DIMM or some of them are listed as O MB. The DIMM reported as 0 MB changes randomly. As you can see in the screenshot 1_memory_M610.jpg attached, the DIMM in socket B4 is reported as 0 MB. Sometimes I can see two or more of them reported as 0 MB and the slots aren't always the same.
After your considerations I think it can be an hardware problem: a faulty DIMM or a faulty SLOT. Could you confirm?

Thanks for the professional support you gave us.

Kind regards,
Davide

1 Attachment

Moderator

 • 

6.2K Posts

September 16th, 2015 11:00

Are there any messages/errors in the hardware log? What DIMM slots are reported as incorrectly as 0MB?

16 Posts

September 17th, 2015 17:00

Hello Daniel,
I didn't find evidence of error/messages in hardware log. The slots reported as 0MB changes after reboot: sometimes it reports the A2, after forcing a new inventory it changes to the B1. Sometimes more than one module is recognized as 0MB (for example the B5 and A1 at the same time). The only messages I can get regarding the modules recognized as 0MB are from lifecycle controller: I get a PR1 and PR11.
I executed Embedded System Diagnostic: I ran two complete memory tests but all the subtasks are completed successfully. The M610 during post show always 32GB of installed memory, the problem seems only at inventory level.

That seems strange, I think there is some faulty module or memory slot that causes problem building inventory but I can't find evidence in hardware log. Do you think that a faulty module/slot can cause inventory problems?

Kind Regards,
Davide

Moderator

 • 

6.2K Posts

September 17th, 2015 18:00

The M610 during post show always 32GB of installed memory, the problem seems only at inventory level.

I thought we cleared this up in my first post. I was under the impression that you installed new memory and it was not being properly detected. If all 32GB of memory is being properly detected by the system BIOS during POST and there are no error messages then the memory is detected and functioning properly.

The inventory is an iDRAC/LCC reporting issue. Make sure your LCC is at the latest revision. If you still have the issue after updating the LCC then reset the iDRAC to defaults, disconnect the blade from the chassis for about 30 seconds to drain flea power, and then power the system back on and check for any change. If there is still no change then reflash the iDRAC firmware.

Thanks

16 Posts

November 3rd, 2015 17:00

Hello Daniel,
I'm sorry for delay. I confirm that LLC version is 1.7.5.4,A00 and the IDRAC version is 3.75,A00, I think it is the latest available.
Today I went to our remote server farm where the blade is installed: I tried to reset IDRAC, I disconnected the blade for 2 minutes, then I reconnected the blade and unfortunately one of the DIMM is recognized as 0MB.

I tried to:
- reflash IDRAC firmware
- reflash LLC using both the standard package and the Repair Package

Unfortunately the issue is still present.

The hardware can be collected only at boot time or exist some live procedure to force an inventory collection from the operating system installed on the blade?

Thanks in Advance,
Davide

Moderator

 • 

6.2K Posts

November 4th, 2015 10:00

The hardware can be collected only at boot time or exist some live procedure to force an inventory collection from the operating system installed on the blade?

Check the CMC and iDRAC hardware inventory. You need to narrow down where the problem is.

Based on the information you have provided so far it sounds like the memory is detected and functioning without issue. The only issue appears to be that the memory is incorrectly reported as 0MB in the CMC hardware inventory. The CMC pulls it's information from the idrac on the individual blades. You need to find out if the iDRAC is reporting the memory incorrectly, CMC is reporting incorrectly, or both.

Also, boot into the LCC and go into the hardware inventory section. Enable Collect System Inventory on Restart and restart the system. This should cause the LCC to pull a new inventory. If that does not resolve the issue then please state whether or not CSIOR was already enabled.

Thanks

No Events found!

Top