Start a Conversation

Unsolved

C

14 Posts

1456

August 6th, 2020 17:00

Random UPS Shutdown During Memory Test (LCD error PWR1006 when running Windows Server)

Hi All,

We are low on budget and just went through massive layoff. We have an ongoing project to to add 10 new servers to replacing existing old R510 ones.

Having no budgets, we bought 10 used R620 off eBay resellers. Each has dual XEON E5-2680 2.7Ghz, with total of 16 cores/32 threads, 64GB memory and 8 brand new Dell certified 600GB 10K drives. These R620 servers have no Dell warranty.

We updated all firmware to the latest for all servers. Each comes with dual redundant 750-watt PSU. Each servers connect to two different UPS for redundancy. We have been using these two UPS for about 4 years without problems (for this rack).

Here is what we found.

1)  Ran Dell memory test. Under 2 weeks, both UPS were shut off in the middle of the night. We knew we got lemons from eBay resellers. The question is which one(s). I have no ideas which one(s) caused both UPS to be shut off.

2) Because the rack was also used by production servers, we had to buy two brand new UPS to test these on a different rack.

3) Now, on a different rack with two brand new UPS. It doesn't shut off the brand new UPS this time. The newer UPS may be able to handle the power problems better.

4) We swapped the UPS to test the used R620 with the two older UPS.

5) No shut down yet with the same memory tests.

6) We installed Windows Server 2012 and the LCD has errors "PWR1006 System power exceeds capacity. System halted. Contact Support". It shuts down Windows Server.

It seems the problem may be the 750-watt PSU. I checked Dell doc at https://tinyurl.com/y49vsr58 and found each R620 most likely requires the 1100W PSU. We are not sure if the hardware itself is suitable to be repaired and be deployed for production.

My question is because we have been running hardware test for months using these under-capacity PSU, will it cause random hardware problems in the long run? I heard under-capacity PSU could damage logic board, CPU, etc (specially if used for a long while). We might have to cut our loss and buy new servers.

Let me know what you think. 

Chris

Moderator

 • 

3.1K Posts

August 6th, 2020 23:00

Hi,

 

Frankly, in a long run, you may face issues, as the system need to draw more power and PSU could not provide the sufficient power. 

 

Could you check in BIOS, what is the power profile at power management settings and have you tried looking at the PSU firmware, if it's possible for firmware availability. https://dell.to/2PyGVFf

 

Let me know if it helps.

14 Posts

August 7th, 2020 10:00

Hi Joey,

I just took over the project last week (after someone else tried to figure out what's wrong with it for almost a year).

We have been running hardware test (using the same Dell memory test to try to see if it will shut off UPS again) for several months. If I swapped out all 750W with 1100W PSU, would these R620 be 100% stable suitable for production? I heard people told me to just dump it and buy new servers to be safe.

Below is the answers to your questions.

The power setup is "Normal". I'm told we did run Dell firmware update tool to update all available firmware. Also, I do have DSET if you would like to take a closer look. 

Is there documentation explaining when R620 requires the 1100W PSU? One of our production R620 (with similar spec) does come with dual 1100W PSU. 

Anyways, below is a log from OMSA of one of the R620. Is below for real? If just one R620 draw 5353.6 A, 10 of them would kill the UPS. No wonder why it shut off both UPS that night.

System Peak Power
Peak Time = Wed Nov 27 21:07:32 2019
Peak Reading = 748 W ( 2553 BTU/hr )

System Peak Amperage
Peak Time = Wed Sep 04 12:35:51 2019
Peak Reading = 5353.6 A

Mon Jul 06 14:41:02 2020 The system halted because system power exceeds capacity.

 

Thanks

Moderator

 • 

3.3K Posts

August 7th, 2020 10:00

Hello ChrisLA,

 

Upgrading PSU is one option to try per the error recommended action. I cannot guarantee this is 100% the problem. I'd recommend try it on one and if it helps do it for the others that present this issue. Check the other recommendations also.

 

https://dell.to/3gGAe02

Page 126

PWR1006

Action :  Review system configuration, upgrade power supplies or reduce system power consumption.

 

 

Also please make sure your DRAC and BIOS are up to date:  https://dell.to/3a47ICS

 

 

You may also be interested in our Enterprise Infrastructure Planning Tool

https://dell.to/2PxYNjJ

14 Posts

August 7th, 2020 11:00

Thanks, Charles.

Both firmware are reasonably up to date. BIOS Version 2.7.0 = Release Date 05/23/2018 and iDRAC7 2.63.60.62 (Build 1). Our production R620 run without problems for many years using older firmware with the exact same UPS. I'll update firmware on each of the 10 R620 units again per your advice. 

Replacing the dual 750W PSU with dual 1100W PSU sounds like a plan. However, it seems you are not certain this will make it 100% stable for production. Is there any possibility that these under-capacity dual 750W PSU may have already caused damage that could end up with random hardware problems in the long run?

The reason I'm asking is that I would rather buy new servers than going too cheap on something this critical. It will be using as our new backup servers, file servers and ESXi. It got to be 100% stable.

Thanks again.

Moderator

 • 

3.3K Posts

August 7th, 2020 12:00

Hello ChrisLA,

 

It wouldn't cause damage. If the PSUs don't have the needed voltage then the server will not post and will give the error stated. 

 

You can test this by going to min to post, if the error doesn't return under that then I would assume that the hardware installed exceeds the PS capacity.


Minimum Components:

1 Power Supply
Control Panel with cable (to power system on)
CPU1 (minimum for troubleshooting)
1 DIMM for CPU 1
Motherboard
Risers


Updating and upgrading is probably your best bet.

Moderator

 • 

3.3K Posts

August 7th, 2020 13:00

Hello ChrisLA,

 

You can use the Enterprise Infrastructure Planning Tool to build out configuration for your environment. 
https://dell.to/2DmjOeT;
There is a tutorial video and user guide.

14 Posts

August 7th, 2020 13:00

Thanks again, Charle.

I did try https://dell.to/2PxYNjJ but it doesn't have older model (R620). This https://dell.to/2DmjOeT has errors.

14 Posts

August 7th, 2020 13:00

I'm going to try the installation instead of the web app.

14 Posts

August 7th, 2020 13:00

Hi Charles,

All of the R620 POST perfectly every time. None of them has problems at boot with no POST errors. So it seems to have enough power to POST.

It does have random problems either shutting off both of the UPS on the rack (taking all servers on the rack down) or shutting off the server if running Windows Server. It's random and not always easy to repeat. 

There seems to be certain conditions that triggers this. We have been booting off Dell diagnostic CD to run memory tests. That's when it shuts off UPS after under 2 weeks.

Is there a way to find out when am R620 requires dual 1100W PSU instead of the 750W ones? There has to be documentation somewhere. 

Thanks.

2.9K Posts

August 7th, 2020 14:00

The only version of the web app we have access to is the one they linked to previously. I'd look at the R630 with similar PSU config as the closes analogue to the R620. 

 

For determining whether the server itself is stable for production, I'd decouple it from the UPS for testing. If the UPS is shutting down, removing the server from it would prevent the UPS from being a concern in whether the server is healthy.

 

I do think that upgrading the PSUs may be a good idea, but not if the only time you're seeing it is during memory testing. However, increasing the maximum possible peak load of the PSU would mean a maximum possible draw from the UPS. If the UPS is shutting down now, that would indicate to me that the UPS may be overloaded. 

 

As for the reporting, I think it would be best to get a better look at that data. I don't believe 5353.6 amps is being supplied to your PSU, because (assuming nominal 120v) would be 642,432W and that seems unlikely (the mains breaker in my house is 100 amps). I'll be on shift for about another hour, so if you would like to PM me, I can give you my email address and you can send me a SupportAssist collection for review. Joey will be coming online before long, as well. I would imagine that getting that collection reviewed would be helpful, regardless of any possible load issues.

 

EDIT: For a simple test to determine if the PSUs are insufficient, you could temporarily make them non-redundant. This would allow each PSU to supply power without needing to provide anything as a backup. This may help you determine if a PSU expansion is necessary by just reallocating what you already have.

14 Posts

August 7th, 2020 14:00

There is no R620 (I did see R420xr). Is there an older version of the web app with R620?

2.9K Posts

August 7th, 2020 15:00

I got your message, it looks like security doesn't like the link, though. Most file sharing sights are blocked to us, unfortunately. Email would likely be the easiest way to send that file over, unless that's a non-option for you.

 

EDIT: Since the DSET is what's available, go ahead and send it over, but I'd be hesitant to make judgement calls in either direction based on it, just due to the amount of time it's been out of use. My concern may show to be unfounded, but I mostly don't want to risk missing some piece of information that may be of use.

14 Posts

August 7th, 2020 15:00

Thanks, Dylan. I just sent you a link to download the DSET.

Regarding UPS, we have been using the exact same APC UPS for several years without problems for several other racks with similar numbers of PowerEdge servers. The issue is random (both UPS shutdown during memory test and PWR1006 running Windows Server) and can not be easily repeated.

If I can repeat it, I would not be too worried. Replace the dual 750W PSU with dual 1100W PSU and be done. I can't be sure it's just the PSU. Hope DSET will help.

If DSET is not good enough, I'll go to the office this weekend to get SupportAssist collection

14 Posts

August 7th, 2020 15:00

The zip file is password-protected and got rejected by most mail scanners. I had to to tinyurl to change the URL and it went through. It doesn't check all redirections

14 Posts

August 7th, 2020 15:00

My email doesn't go through either. Let me try something else.

No Events found!

Top