Start a Conversation

Unsolved

G

6 Posts

4551

September 20th, 2019 07:00

R640 and Broadcom - Poor performance on SMB Direct (RDMA) configuration

Hello all.

So, I've got a couple new Dell R640 servers with Broadcom NetXtreme E-series Adapter (BCM57416) for a new Hyper-V Cluster  and I've been trying really hard to get SMB Direct working between them so I can leverage the best possible performance during LiveMigration. 

The thing is: all tests go green across the board, both Test-RDMA and Validate-DCB, as per screenshots below:

RDMA1.png

 
 

But when trying to copy something between the servers I get REALLY slow throughput:

Transfer Slow.png

 

The result for Get-SmbMultichannelConnection shows that RDMA is being used:

 

As soon as I disable RDMA on the interface, transfer speed goes back to normal

 

Transfer Normal.png

I've tried bypassing the switch and linking both interfaces directly, but to no avail. The results were the same.

So I guess my question is: has anyone gotten SMB Direct working successfully over Broadcom interfaces? I cannot for the life of me find anything wrong with my setup, but it just won't work.

Best regards,

Giovani

Moderator

 • 

6.2K Posts

September 20th, 2019 11:00

Hello

Here is an article on setting up RDMA. It may shed light on any configuration issues.

https://www.dell.com/support/article/how16693/

Thanks

6 Posts

September 23rd, 2019 08:00

Hey Daniel.

I have followed that guide and so many others. As I pointed out, every test I run says RDMA is working, and in fact it is, but not to the desired speed. But I have some more info.

On these servers I have a daughter card and a PCIe card, both with BCM57416 chips. When testing RDMA on the daughter card I get really slow speed, but for the PCIe card I have no issues, which confirms that the configuration is sound. Here are the results:

Transfer Speed with Daughter Card Disabled - you can notice that it just skyrockets from 4MB/s to 860MB/s after disabling it.

Transfer Speed Daughter Card DisabledTransfer Speed Daughter Card Disabled

Transfer Speed with Daughter Card Enabled

Transfer Speed Daughter Card EnabledTransfer Speed Daughter Card Enabled

Physical NICs, one PCIe and one Daughter Card, same make and model

Physical InterfacesPhysical Interfaces

Virtual Switch Mapping, LiveMigration-1325 mapped to SET1 - PCIe, and LiveMigration-1326 mapped to SET2 - Daughter Card.

Virtual Interface to Physical Interface MappingVirtual Interface to Physical Interface Mapping

RDMA test results for LiveMIgration-1326, bound to Daughter Card adapter. You can notice very low throughput

RDMA Test Daughter CradRDMA Test Daughter Crad

RDMA test results for LiveMIgration-1325, bound to PCIe adapter. You can notice a very higher throughput.

RDMA Test PCIe CardRDMA Test PCIe Card

The configuration is exactly the same for both servers and interfaces, so I can suspect there's a hardware issue or maybe some hidden setting I'm overlooking?

Regards,

Giovani

Moderator

 • 

6.2K Posts

September 24th, 2019 10:00

I'm not seeing any obvious issues. I am also not finding any known issues that are exactly the same as what you are experiencing. There are known issues with virtual functions that may be related. Those issues should be resolved with the latest driver and firmware, so I suggest making sure you are using the latest driver and firmware.

6 Posts

September 25th, 2019 13:00

Hey Daniel. Thanks for replying.

Well, there were firmware and drivers updates released last week that I have just applied to all servers, but still no change. I'm baffled by this, I cannot seem to find a way to make it work.

It's worth saying that I have another couple of R640 with BCM57412 (SFP+) with the exact same behavior.

Regards,

Giovani

Moderator

 • 

6.2K Posts

September 27th, 2019 09:00

An enterprise engineer emailed me and said they had experienced this issue and found a work-around. They were able to set RoCE to v1 and traffic was passed normally.

If that is the case then it sounds like a bug. The configuration would need to be verified as valid and then it would need to be reproduced to escalate it for a bug fix. I do not plan on trying to reproduce this issue, it would be very time consuming for me to build out a test environment for this.

6 Posts

September 27th, 2019 13:00

Hey Daniel.

I guess we hit a bug then. With RoCEv1 everything works. Let me know if you need some evidence from me to get this up the chain. I'll be happy to provide anything you need.

RDMA PCIe.pngRDMA Daughter Card.pngSMB DIrect Intrface 2.pngSMB DIrect Intrface 1.png

Regards,

Giovani

March 4th, 2020 15:00

Any tips on getting PFC working with my Broadcom 57412 daughter card? I have the same issue with RoCEv1 vs v2, but I cannot figure out why my PFC is false. I am using the latest 21.60 firmware and driver.

*Edit, I found changing the DCBX Mode setting to CEE (Only) allows successful test-rdma.ps1 IsRoCE $True  using v1 without the PFC error that I was receiving when DCBX Mode was set to IEEE (Only) and speed is normal. However, Get-NetAdapterRdma still shows PFC false.

2.JPG

1.JPG

Thanks!

October 6th, 2020 11:00

Anyone make any progress on this issue?

I have the Broadcom 57416 and have much worse RDMA performance than non-RDMA. Switching to ROCEv1 did not fix it for me. My PFC also shows false, despite being configured.

Moderator

 • 

8.5K Posts

October 6th, 2020 13:00

Kungfujediss,

 

I would suggest calling in for support as there are multiple things from TSRs, configuration issues, etc that could be causing it. 

1 Message

May 17th, 2022 14:00

For anyone who comes across this thread down the road:

We ran into this issue, and spent a significant amount of time troubleshooting with both Dell and Microsoft. The root cause is that the Broadcom 57 series NICs do NOT properly support RDMA or RDMA ROCE. There is no fix. The cards will let you turn the feature on, but they cannot properly handle the RDMA traffic and will result in throughput measured in single digit Mbps.


To make matters more complicated for troubleshooting, the Microsoft Test-RDMA PowerShell cmdlets and Validate-DCB cmdlets will not show ANY errors for these cards, because RDMA traffic does pass correctly at low speed. However, the usable speed is so low that they are completely unusable for production traffic.

This impacts all of the following cards (And likely others from Broadcom)

  • Broadcom 5720 Dual Port 1 GbE Network LOM Mezz Card
  • Broadcom 57412 2 Port 10Gb SRP+ + 5720 2 Port 1 GB Base-T, rNDC
  • Broadcom 57416 Dual Port 10 Gb E SFP+ Network LOM Mezz Card
  • Broadcom 57414 Dual Port 25 Gb E SFP+ Network LOM Mezz Card
No Events found!

Top