Unsolved

This post is more than 5 years old

8 Posts

7559

March 10th, 2019 23:00

Nvidia T4 GPU's with R740 - System BIOS has halted

Hi, I'm trying to add 4 x NVIDIA T4 GPU's to a brand new R740 with 2 x CPU's Three cards are acceptable by the system but as soon i try to add the 4th card, BIOS keep crashing (System BIOS has halted msg in iDRAC) Existing cards location: PERC: #6; GPU's: #1, #4, #8 (working) Trying to add the forth to slot #7 or #2 with no success BIOS is latest version 1.6.13 2 x 1600W PSU are used while the usage reported in iDRAC is below 400W when using all three cards Appreciate any ideas Thanks,

12 Elder

 • 

6.2K Posts

March 11th, 2019 14:00

Hello

There are specific riser configurations for GPU support. You can find information on supported riser configurations and slot priority in the system manual.

http://www.dell.com/support/

You can also check the hardware to see if more information is listed. I would also check thermals in the system to see if anything is overheating. You should be using performance fans and have the GPU shroud installed.

Thanks

8 Posts

March 12th, 2019 02:00

Hi,

I'm using riser config #6, which is the right one according to the manual

no alerts are visible in iDRAC (even with all 4 cards), all is green, its just stuck in a loop and cannot proceed

4 Apprentice

 • 

2.5K Posts

March 12th, 2019 07:00

what are you doing a BIT coin mining job?

with 4 GPU video cards,why would any old R7xx  server do that?

if mining they sell motherboards just for that, 

4x GPU was my only hint at the job or goal here, not a file server at all. right?

each card uses 70watts,

this new GPU, which Nvidia designed for inference workloads in hyperscale data centers, leverages the same Turing microarchitecture as Nvidia's forthcoming  GeForce RTX 20-series gaming graphics cards.

no infernence load stated, nor OS you run,

the card is no video card, but  is compute engine alone, and very powerful, but Intel can beat it.

I don't see how any riser card  can do 140watts, on one card alone.

oops just RTM read my manual on R7xx and it can not do that power level, not even close. sorry.

IDK. or make the BIOS HAPPY ever.

Each card is like 75watt incandescent lamp,  said for you to feel that heat there, safely back not touching.

now x4 that. yah lots and lots of heat, and huge currents flowing in the riser and to the motherboard.

it was never designed for that JOB.

 

 

8 Posts

March 12th, 2019 08:00

Hi, I'm building an inference server I managed to run 2 cards on a single riser with total of 3 cards in the system, even when leaving one totally free riser and x16 free, and the server works fine. the issue begins only when adding the 4th card its not faulty card nor faulty riser

12 Elder

 • 

6.2K Posts

March 12th, 2019 09:00


@Beef23 wrote:
its not faulty card nor faulty riser

How did you come to that conclusion? Did you test the card card in another slot, and did you get a card to function in the two slots you are trying to add the 4th card?

8 Posts

March 13th, 2019 00:00

Yes,

I cycle the cards, cycle the slots, and tried any possible location and cards combinations

the end result is the same, 3 cards works, 4 does not

 

I believe it has something with the BIOS.

the server should support up to 6xP4 which has similar characters to the T4, so 4 should not be  a technical issue

4 Apprentice

 • 

2.5K Posts

March 13th, 2019 07:00

I have little facts that T4 card set, has a ROM (firmware) extension in it. (the data sheet is very weak)

but shows 75watts per card.

If it does then the BIOS in this server has limits on that, allowing OProm extension 1 to 3?  or less?

The drac, and raid all have extensions and are part of the total count I bet. (max)

the risers are not rated for that current flow, (watts) P= V x A.

BIOS limits, and power limits. nor is the card dell certified, i bet.

page 39 in my R7xx manual states and I quote and high light in red.

"

System support for 25 W maximum power for the first two cards and 15 W for the third and fourth cards (lower power support on third and fourth cards due to system thermal limitations)
 Optional x16 riser to accommodate interface cards for external GPU boxes that supports a maximum power of 25W (use of riser reduces the number of PCI Express slots from four to three)"

 

this says even the most important thing fails.  POWER, no power no joy, in the world of electronics.

  • power lacking
  • BIOS extension limits (OProm counts)
  • and not certified by DELL, sorry.

12 Elder

 • 

6.2K Posts

March 13th, 2019 08:00

Yes, I show we have validated the system for P4, but I don't show we have validated it for T4. It is a validated configuration according to Nvidia.

It could be a compatibility issue or a configuration issue. Since the system was not ordered in this configuration the issue may be that a piece of hardware is missing or there is hardware in the system that is not supported with that many GPGPUs. I suggest reviewing all of the documentation to make sure the hardware requirements are being met.

2 Posts

June 21st, 2019 04:00

Hi all, Have 2x of the same, the key here - I think - is ordering them with 1600W power Supplies, 6 Performance Fans for R740/740XD and the GPU Ready Configuration Cable Install Kit & HS Install Kit,GPU Config,No cable - even if you do not use the Power plugs for these cards. It has to be clear from the start of the order process with Dell you intend to install these cards in the system. If done correctly it will support up to 6 cards but only 3x of these will get x16 PCIe slots. For full blast on these look at the 2U AMD servers from Dell EMC. Hopefully they will also make them available for the C4140 1U system, it offers 4x x16 PCIe and is only 1U.

September 6th, 2019 09:00

Can you recommend BIOS setting configuration for an R740xd and four or six T4 GPUs

No Events found!

Top