Start a Conversation

Solved!

Go to Solution

4926

August 17th, 2021 15:00

R720 GPU installed, PSU blinking amber

So as I was able to find, installing GPUs will require the riser and the kit and the whole nine yards.

I got the 1100W PSUs, I updated bios, updated iDRAC to latest versions.

Installed GPU and used the riser power cable to do it. I enable power to the machine and there it is, blinking amber light. If I disconnect the power from the GPU, even without completely removing it, the system works just fine. Power back to GPU? Machine won't even start. iDRAC is still accessible through the network but that's about it, it just refuses to boot and nothing regarding the issue shows on any logs. 

When I go to power configuration, the system is capped at 461 watts for some reason. The input wattage is over 1200w and output (of PSUs) is 1100w. There is more than enough for them to work properly. 

Why is power capped at 461 when I have two 1100W PSUs? 

I reset iDRAC to default twice and it is still capped at 461 and the machine refuses to boot only when GPU power is connected.

August 18th, 2021 11:00

These are the actual cables out of my server.  The long cable is the one that you probably have.  The short Y-cable is the one that you need.  The white plug on the long cable goes into the riser card.  The Y-cable plugs into the Tesla K-80.  The R730 GPU cable meant for the K80 is essentially this configuration, without the mess of adapting, or extra plugs.

Cable 1.jpgCable 2.jpgCable 3.jpg

August 17th, 2021 18:00

Hello Shine, 

Here is the power cap thing from iDRAC menu: 

ammarsalman94_0-1629250089050.png

 

I have two 1100W PSUs, tried in redundant mode, tried without it. Power cap policy is disabled. But if I do enable it, I cannot raise the cap above 461 Watts. 

Regarding the set up: 

I have the riser expansion and shields all installed according to the guidelines. I have 2xE5-2640 0 processors, TDP is 95 (well below the 115 in the guideline). 64GB of RAM is on the system. 

The power cables I used were the 09H6FV which are designed for this particular purpose. The GPUs are Tesla K80 (both). I tried installing only one at a time, same exact issue. 

I upgraded BIOS, iDRAC and Lifecycle Controller. I directly installed latest version from the Dell driver support website (I did not download versions progressively and install each separately all the way to the latest). 

4GB IOMMU is enabled in BIOS. But I doubt this has any effect given I can't even boot the system up.

If I remove GPUs, I can boot and use everything fine. 

4 Operator

 • 

3K Posts

August 17th, 2021 18:00

Can you share the picture of power cap which you referring? You can also check below link for details on GPU card installation on R720 and ensure all prerequisite and connection are proper. Can you also let me know the model and part number of GPU card you are connecting in the server?

https://www.dell.com/support/manuals/en-us/poweredge-r720/720720xdom/gpu-card-installation-guidelines?guid=guid-268578e4-b96b-43ba-ac13-58c567cbb670&lang=en-us

https://www.dell.com/support/manuals/en-us/poweredge-r720/720720xdom/installing-a-gpu-card?guid=guid-244e9d21-fbb7-4246-9602-4d85e5cab508&lang=en-us 

4 Operator

 • 

3K Posts

August 17th, 2021 20:00

Minimum and Maximum powercap will come in to picture only when you enable power cap. System can utilize full power as long as you have power cap disabled. 

Telsa K80 is not a supported graphics card on R720. That does not mean it will won't work. We seen user using unsupported card on the server without any issues. We can not guarantee that it will work as it is not validated card.

Can you check Post Code and Lifecycle Log on iDRAC when you try to power on the server with  graphics card on it.

August 17th, 2021 20:00

Yes I know the K80 is not on the official list but I have seen many examples where two were installed and worked as you mentioned. 

There is no POST code (0x0) and the log on iDRAC does not have anything. The system does not power on at all. iDRAC is still accessible somehow, but it shows the power status as OFF and when I send the command to turn it on it's just ignored (same with pressing the power button). 

August 17th, 2021 20:00

They're double width so they can only fit in slots 4 and 6 as I connected them. 

4 Operator

 • 

3K Posts

August 17th, 2021 20:00

August 17th, 2021 23:00

I tried once more and there was a log on the Lifecycle controller, it said: 

PSU0036

The error code was for both PSUs. 

I don't understand what is causing this. There is more than enough power and the cables I brought were just new. This only happens when the GPUs are connected. 

Moderator

 • 

3K Posts

August 18th, 2021 01:00

Hi,

 

I'm Joey from the enterprise social support, and I'm been eyeing on your post about the GPU installation. It's great that you are in the same page as in K80 is not in the official list of R720. 

 

From the error codes that you have provided, it seems to be pointing on the GPU cable that you have, 9H6FV. I probably suggest to try obtain another one, yes knowing it's new, but probably unknown reason, the cable might be faulty. 

 

Last year, there is 1 user have the same question and did manage to get the card working on R720. Let me tag here, to try see if could get an opinion about your issue. 

 

Hi @OldSchoolAdm1n, would you be able to let us know your opinion on the K80 installation on your previous post last year in R720?

August 18th, 2021 08:00

I was wondering how would 2 new cables be corrupted at once. I will try another cable nonetheless.

August 18th, 2021 09:00

Hi.  Sorry to hear about your troubles.

I think your issue has to do with the cables you are using.  That part number is for a "GPU in a Dell R720".  There is no Dell R720 cable that will work with a Tesla K80.  It is made for K20's and K40's.

To be clear: YOU NEED A K80 CABLE.  Cables for the K80 are different.  No other cable will work.

Amazon has some Y-type cables made specifically for Tesla K80's.

I have looked at the Tesla K80 cable made for the Dell R730.  That cable should work, but I have not yet tried it.

The point being, your cable must be wired for a K80.  You can not use these cables made for 6+8 GPUs.

As far as the power capping issue, I have no idea at the moment, but it sounds like it needs to be addressed also.

 

August 18th, 2021 20:00

Thank you so much for the information. 

I had initially installed the K80 in my own PC and I used a CPU power cable (8-pin both sides) and that was the only configuration I could get to work on my PC. 

I have tried using that one cable in particular and sure enough, the system booted and I no longer have the error. 

However, it is not recognized at all by anything. Not the OS nor iDRAC. I collected the SupportAssist data and here it is: 

ammarsalman94_1-1629342200104.png

Any ideas why is it not recognized? What BIOS settings are you using that may be of importance? I do have the MMIO 4GB enabled. 

August 18th, 2021 22:00

Glad it's booting.  I didn't see where you mentioned what OS, but assuming it is Windows Server, did you look at Device Manager?  If you've done nothing else, you should see either 2 or 4 "Unknown Video Device" or something similar to that under video devices.  Seems to me that I had to install an Intel.inf before the K80's (and some other unk devices) were recognized.  Then of course the Tesla drivers from nVidia.  Most likely if you don't see them, they are hidden down at the bottom of the device tree in PCI devices and you just don't recognize the PCI ID's since no name is listed.

August 18th, 2021 23:00

I've tried ESXI and Windows neither of which were able to see it. I've tried going to the devices list from the setup menu (F2) and it wasn't there in PCIe devices with. The list above is the full list from iDRAC and I don't think it's on there at all. 

 

What is intel.inf? 

 

Either way, I really appreciate your help. I'll mark your previous reply as the solution but

August 19th, 2021 00:00

I don't think ESXi sees GPU's directly, however I'm not 110% certain about that.  If I remember right, ESXi just passes calls between the "Guest OS VM" and the GPU in the ESXi "Host" Machine (R720).  One set of GPU's could be shared between several Virtual Machines on the same Host machine, but the VMs would still have to be configured with device drivers to operate them.  Tesla also has GRID GPUs for use in VM type environments, but they require licenses and other things that I have never worked with.  If anyone knows for certain please jump in.

The Intel.inf is a device chipset information file.  It is the file that basically tells Windows what those unrecognized devices in a machine are so you can install device drivers for them.

When you said Windows, did you mean Server or Win 10?  If you tried win 10, I would recommend downloading a Windows Server Evaluation directly from MS and start over.  K80's are considered datacenter hardware geared towards servers and high end workstations.  Win 10 leans more towards general purpose GPUs for gaming, so I have no idea how well a K80 would be supported in a Win 10 environment.  Since some of the stuff I run on the GPUs takes months to run I would cringe at the thought of trying it on a Windows desktop OS.  What is your end goal once this machine was running?

Let's try Windows Server and see if we can get this thing going.

 

No Events found!

Top