Start a Conversation

Solved!

Go to Solution

8377

December 9th, 2020 18:00

Tesla K80's in Dell R720 Server?

​Has anyone "actually physically tried" installing Tesla K80's in a R720 server?​
​If so, how did it turn out? I am interested in hearing both success and failure stories.​
​And yes, I know it's documented as unsupported. That is not the question.​
​I have seen reference to a University that had several servers in this configuration as part of their school cluster.​
​I have R720's running with K20's that I want to upgrade. I have both K40's and K80's in my possession.​
​Looking for some "been there, tried that" enlightenment in reference to the K80's.​

​Thanks,​
​James​

December 19th, 2020 06:00

Update:  Installed second Tesla K80 in the R720 and everything is working fine.

Everything is functioning correctly and operating as expected.

Dual K80's at 100% are obviously much warmer overall than dual K20's, but they are staying well within their operating temp specs as the fans are running at a higher rpm.  I was quite happy to find out the system fans are properly reactive to the K80 operating temp changes.  This was a point of major concern going into this project.

All in all, I am quite pleased with how this project has turned out, as it really changes the landscape going forward now that we know it is possible.

James

Moderator

 • 

2.2K Posts

December 10th, 2020 00:00

Hi,

 

I want to share this link with you https://dell.to/3n3rZON as it is a similar topic.

 

The K80 works on R730 but the R720 may not be working. I could not see the K80 in their spare parts list for the R720. Apart from the any video card you will install, I would also like to draw your attention to power supplies. When you install a video card, please make sure that it meets the power. https://dell.to/3gzrZDO

 

Let us know if it helps!

December 10th, 2020 02:00

Thanks for the reply Erman.

Yeah, the K80's aren't in the R720 parts list as they are unsupported.  They are on the R730 parts list since they are officially supported on that model, but I don't have any at this time.  I'm good on the power, as I stated the R720's are currently running with dual K20's, I just need faster, that's why I have the K40's.  The K80's are spares, but hey, if I can get them to work, then they're going in.

As an update, I drilled down deeper in the university's computer cluster specs.  Deeper in the documentation the reference to R720's magically turns into R730's.  My guess is that the R720 reference in the cluster spec overview was either a typo or someone just mistakenly referenced the wrong model number.  I was bummed.

Still waiting to hear about actual attempts.  Surely I'm not the first person to have all of this equipment sitting on a bench and wondered if it might actually work regardless of what the "official" docs indicated.

December 18th, 2020 01:00

So, the answer to my original question is:  YES

You can run a nVidia Tesla K80 in a Dell R720 Server.

I have a R720 on the bench running a single Tesla K80 at this time.  Everything appears to be functioning correctly and operating as normal.

I will attempt to install a second K80 after some tests complete.

I will update as I get more info.

James

1 Message

May 1st, 2021 00:00

Hi, 

Which OS are you using? I use Win10 and the GPU reports Code 12. I use one Dell R720 server. 

August 18th, 2021 23:00

Hi.  That server has Windows Server 2019 DataCenter on it.  I have no idea how Windows 10 would run on a R720 with GPU's.  Sorry for the late response, but I just saw your message.  Did you ever get your Code 12 error resolved?

4 Posts

August 19th, 2021 17:00

Thank you very much for bringing up this topic.

I will go ahead to buy R720 and two Tesla K80! 

October 8th, 2021 17:00

What did you have to do to get the 720 to boot?
mine sits at a single prompt after post... never boots

I verified I set within the BIOS above 4gb under the Integrated Devices > video area....

is there something else I need to do?
I plugged it into the x16 slot and powered from the slots GPU to the card... with a 8pin to 8pin

October 9th, 2021 10:00

Hi.  What happens if you leave the card in the machine, but unplug the cable to the GPU and try to boot it up?  Does it halt still, or does it continue to boot up normally?

4 Posts

October 17th, 2021 05:00

I successfully deployed the R720 system with two Tesla K80, however, the 2 Tesla cards cannot reach their theoretical performance (such as double-precision FP64 should be 1.37 TFLOPs for a single GPU, but only 0.9 TFLOPs or 3.6TFLOPs for 4 GPUs as shown in the AIDA64 GPGPU Benchmark). If you have any idea to solve this problem, please share it. Thank you.

 

20210913_150425 - Copy.jpg

 

gpgpu.png

 

4 Posts

October 17th, 2021 05:00

I successfully deployed the R720 system with two Tesla K80, however, the 2 Tesla cards cannot reach their theoretical performance (such as double-precision FP64 should be 1.37 TFLOPs for a single GPU, but only 0.9 TFLOPs or 3.6TFLOPs for 4 GPUs as shown in the AIDA64 GPGPU Benchmark). If you have any idea to solve this problem, please share it. Thank you.

p/s: I am using Windows Server 2016.

 

20210913_150425 - Copy.jpg

 

gpgpu.png

 

October 17th, 2021 07:00

Some things for your consideration:

1.   How EXACTLY are you testing this?  Let us know and we can check our own machines to get some benchmarks.

2.   What is your ambient air inlet temperature of the server?

3.   What are the GPU core temperatures at 100% with nvidia-smi?

4.   Keep in mind that the system and the K80s have shutdown temps.  If the K80s start getting too hot they will throttle down to try to prevent this by slowing down.

5.   While I have never tried this, I have seen where you can "overclock" and "underclock" the K80s from the commandline.  Might want to lookup how to check this and see if yours have been changed from the default settings.

If I think of anything else I will add it later.

October 17th, 2021 14:00

After being blank squares all day your images finally started displaying for me.  Kinda clears up how you are testing.  Now to sort out the results.

Moderator

 • 

3.7K Posts

October 17th, 2021 20:00

Hi, thanks for choosing Dell. I don’t see installing a R720 system with two Tesla K80 in our guidelines. That is a lot of modification from the original I’d have to say. Sorry I can’t be of help this time but we could wait to see what other users on the community may have experienced. Wish you a good one. 

4 Posts

October 18th, 2021 06:00

1.   How EXACTLY are you testing this?  Let us know and we can check our own machines to get some benchmarks.

>> It is clear now as you see.

2.   What is your ambient air inlet temperature of the server?

>> 20'C

3.   What are the GPU core temperatures at 100% with nvidia-smi?

>>

DzungPham_1-1634562937563.png

DzungPham_3-1634563365617.png

 

 

4.   Keep in mind that the system and the K80s have shutdown temps.  If the K80s start getting too hot they will throttle down to try to prevent this by slowing down.

>> The system worked under the shutdowm temp.

5.   While I have never tried this, I have seen where you can "overclock" and "underclock" the K80s from the commandline.  Might want to lookup how to check this and see if yours have been changed from the default settings.

>> I checked (nvidia-smi -q -i 0 -d CLOCK) and it shows "On" as a default setting. Based on what I found (for example here https://xstream.stanford.edu/specs/ ), it seems like the results from AIDA64 is the performance at Base Clocks, not GPU Boost Clocks. So, I wonder the best way to benchmark FP64 performance.

DzungPham_0-1634562840690.png

DzungPham_2-1634562992058.png

 

 

No Events found!

Top