Start a Conversation

Solved!

Go to Solution

5353

June 17th, 2020 20:00

R730 and NVIDIA Tesla K80 - Not Detected?

So, I first struggled to get the K80 powered.  I found the proper cable for the NVIDIA as it appears the pinning may be specific.  Prior, the system would fail to boot and report an over-current situation.  With the proper power cable for the card, the system now boots.

My problem NOW is that Windows Server 2016 just doesn't appear to see the card at all.  It doesn't show up in Device Manager, and when attempting to install the NVIDIA drivers, the installer immediately fails and says it can't find compatible hardware.  There's also no other hardware in Device Manager that would otherwise be the card just improperly classified, etc.  It's simply not being seen.

I've searched in the BIOS but I didn't see anything there that looked suspicious.  I've also enabled hardware inventory updates, but I don't think that has any impact here, either.  Any ideas?  Am I missing something that needs to be enabled?

I've installed the card on Riser 3 if that's meaningful.

Thanks!

7 Posts

June 18th, 2020 17:00

I was just able to solve this.

The solution was silly, but it worked - I simply reset the card in the PCIe slot and also flipped around the power cable.  I expect the cable had nothing to do with it, but I'm including it here anyway since it technically was a variable that changed.  The card looked and felt firmly seated, but nonetheless perhaps it wasn't fully?

In any case, when the machine booted, I noticed that it booted differently - fans were at full speed for a good 10 or 15 seconds during the boot sequence which never happens.  Once into Windows, the Device Manager immediately showed two "3D Video Device" (the K80 has two GPUs on it).  The NVIDIA installer then worked perfectly.

So, net result - it was likely that the card was just not fully seated.  Thanks for the help nonetheless.

Moderator

 • 

790 Posts

June 18th, 2020 01:00

Hey ezmaass,

 

saw your PM, nice to see that you found the correct cable.

So what about the PSUs are there 2 1100W PSUs in this system? Like it's mentioned here (https://dell.to/3fDd7Ci).

 

NOTE: When using the system with the Nvidia K80 GPU card, ensure that you install both PSUs with a minimum of 1100 W each and set the PSU configuration to non-redundant mode.

 

Best regards,
Stefan

7 Posts

June 18th, 2020 16:00

Hi Stefan, Yes, the system has two 1100w PSUs and they are, indeed, configured in non-redundant mode. I suspect that the system has the proper power since it was previously reporting an over-current (error 36) situation when the cable wasn't correct. Now, it boots fine, but the card doesn't appear to be visible to the OS - leading me to believe something is still wrong at the hardware level. Are there any diagnostics that I might look at to at least confirm or deny whether the system can see something plugged into the PCIe slot? Any issues with using Riser 3 instead of Riser 2? I selected Riser 3 as it looked better for airflow, but I understand both should work. Any BIOS settings that I need to look at? I've read the R730 documentation on installing a GPU, and nothing is mentioned beyond just the elementary basics of and aforementioned power supply requirements. I suppose, at this point, I'm trying to rule out whether it's something at the hardware level (where the machine literally isn't talking to the card or something), or whether it's an OS issue (Win 2016 Server) where the OS needs to be adjusted to see the card. I've been leaning towards something with hardware/BIOS simply because I'd expect the OS would still discover the device in Device Manager, even if incorrectly classified, no drivers, etc. Thanks for the help and any insights!

Moderator

 • 

790 Posts

June 19th, 2020 00:00

Hi ezmaass,

 

I'm so happy to read that everything is working fine now

 

 

1 Message

January 23rd, 2023 03:00

Exact same issue happened to 2 of my gpu servers. swapped the cable around the other way and reseated cards and it booted fine, and perfect since

5 Posts

July 13th, 2023 01:00

Hi, could you please tell us about your experience with the usage of K80? I would like to buy one for image segmentation-based deep learning models.

Moderator

 • 

3.4K Posts

July 13th, 2023 07:00

Hello,

what do you want to know in particular about this product?

Thanks

September 2nd, 2023 18:43

Hello,

Did you guys have any problems with the fans? After installing the GPU with the cable part number N08NH, the fans average at 77% in idle, no load on the server. Is this normal ?

Temperatures below:

CPU1 Temp 41 °C (105.8 °F)

CPU2 Temp 41 °C (105.8 °F)

System Board Exhaust Temp 34 °C (93.2 °F)

System Board Inlet Temp 30 °C (86.0 °F)

Kind regards,

Andrei

Moderator

 • 

2.1K Posts

04-09-2023 06:55 AM

Hello, this cable seems compatible with R730. When the R730 server senses a non-Dell PCIe card such as the Nvidia Tesla, it boosts the fan speed to keep the card cool. This is how it is supposed to work. But I found some users have found ways to turn off the fan acceleration or change the fan speed by themselves using IPMI tools or RACADM commands. You can see some of these commands in the web search results I gave you.

Before anyone asks. Yes I had to disable the 3rd party PCI card 100% fan response. See below:

reference link: https://dell.to/45Brt0E

For IPMITOOL

1. Install IPMI tools:

yum/dnf/apt install OpenIPMI OpenIPMI-tools

chkconfig ipmi on  # << optional for the task

service ipmi start  # << optional for the task

2. Query Dell's Third-Party PCIe card based default system fan response:

ipmitool raw 0x30 0xce 0x01 0x16 0x05 0x00 0x00 0x00

response like below means Disabled

16 05 00 00 00 05 00 01 00 00

response like below means Enabled

16 05 00 00 00 05 00 00 00 00

3. Set Third-Party PCIe Card Default Cooling Response Logic To Disabled:

ipmitool raw 0x30 0xce 0x00 0x16 0x05 0x00 0x00 0x00 0x05 0x00 0x01 0x00 0x00 

4. Set Third-Party PCIe Card Default Cooling Response Logic To Enabled:

ipmitool raw 0x30 0xce 0x00 0x16 0x05 0x00 0x00 0x00 0x05 0x00 0x00 0x00 0x00 

Manually setting offsets via RACADM

SSH directly into your IDRAC:

Get SpeedOffset value:

racadm get system.thermalsettings.FanSpeedOffset

Set SpeedOffset value:

racadm set system.thermalsettings.FanSpeedOffset X

The X values are:

0 - Low Fan Speed

1 - High Fan Speed

2 - Medium Fan Speed

3 - Max Fan Speed

255 - None

-- Works on 10/20/30 series. Too poor to test on 40 series.

 

DELL-Erman O

Social Media and Communities Professional

Dell Technologies | Enterprise Support Services

#IWork4Dell

Did I answer your query? Please click on ‘Mark as Accepted Answer’. ‘Thumbs up’ the posts you like!

No Events found!

Top