Rack Servers

Last reply by 05-17-2022 Unsolved
Start a Discussion
2 Bronze
2 Bronze

R640 and Broadcom - Poor performance on SMB Direct (RDMA) configuration

Hello all.

So, I've got a couple new Dell R640 servers with Broadcom NetXtreme E-series Adapter (BCM57416) for a new Hyper-V Cluster  and I've been trying really hard to get SMB Direct working between them so I can leverage the best possible performance during LiveMigration. 

The thing is: all tests go green across the board, both Test-RDMA and Validate-DCB, as per screenshots below:

RDMA1.png

 
 

But when trying to copy something between the servers I get REALLY slow throughput:

Transfer Slow.png

 

The result for Get-SmbMultichannelConnection shows that RDMA is being used:

 

As soon as I disable RDMA on the interface, transfer speed goes back to normal

 

Transfer Normal.png

I've tried bypassing the switch and linking both interfaces directly, but to no avail. The results were the same.

So I guess my question is: has anyone gotten SMB Direct working successfully over Broadcom interfaces? I cannot for the life of me find anything wrong with my setup, but it just won't work.

Best regards,

Giovani

Replies (10)
Moderator
Moderator

Hello

Here is an article on setting up RDMA. It may shed light on any configuration issues.

https://www.dell.com/support/article/how16693/

Thanks

Daniel Mysinger
Dell EMC, Enterprise Engineer

Hey Daniel.

I have followed that guide and so many others. As I pointed out, every test I run says RDMA is working, and in fact it is, but not to the desired speed. But I have some more info.

On these servers I have a daughter card and a PCIe card, both with BCM57416 chips. When testing RDMA on the daughter card I get really slow speed, but for the PCIe card I have no issues, which confirms that the configuration is sound. Here are the results:

Transfer Speed with Daughter Card Disabled - you can notice that it just skyrockets from 4MB/s to 860MB/s after disabling it.

Transfer Speed Daughter Card DisabledTransfer Speed Daughter Card Disabled

Transfer Speed with Daughter Card Enabled

Transfer Speed Daughter Card EnabledTransfer Speed Daughter Card Enabled

Physical NICs, one PCIe and one Daughter Card, same make and model

Physical InterfacesPhysical Interfaces

Virtual Switch Mapping, LiveMigration-1325 mapped to SET1 - PCIe, and LiveMigration-1326 mapped to SET2 - Daughter Card.

Virtual Interface to Physical Interface MappingVirtual Interface to Physical Interface Mapping

RDMA test results for LiveMIgration-1326, bound to Daughter Card adapter. You can notice very low throughput

RDMA Test Daughter CradRDMA Test Daughter Crad

RDMA test results for LiveMIgration-1325, bound to PCIe adapter. You can notice a very higher throughput.

RDMA Test PCIe CardRDMA Test PCIe Card

The configuration is exactly the same for both servers and interfaces, so I can suspect there's a hardware issue or maybe some hidden setting I'm overlooking?

Regards,

Giovani

I'm not seeing any obvious issues. I am also not finding any known issues that are exactly the same as what you are experiencing. There are known issues with virtual functions that may be related. Those issues should be resolved with the latest driver and firmware, so I suggest making sure you are using the latest driver and firmware.

Daniel Mysinger
Dell EMC, Enterprise Engineer

Hey Daniel. Thanks for replying.

Well, there were firmware and drivers updates released last week that I have just applied to all servers, but still no change. I'm baffled by this, I cannot seem to find a way to make it work.

It's worth saying that I have another couple of R640 with BCM57412 (SFP+) with the exact same behavior.

Regards,

Giovani

An enterprise engineer emailed me and said they had experienced this issue and found a work-around. They were able to set RoCE to v1 and traffic was passed normally.

If that is the case then it sounds like a bug. The configuration would need to be verified as valid and then it would need to be reproduced to escalate it for a bug fix. I do not plan on trying to reproduce this issue, it would be very time consuming for me to build out a test environment for this.

Daniel Mysinger
Dell EMC, Enterprise Engineer

Hey Daniel.

I guess we hit a bug then. With RoCEv1 everything works. Let me know if you need some evidence from me to get this up the chain. I'll be happy to provide anything you need.

RDMA PCIe.pngRDMA Daughter Card.pngSMB DIrect Intrface 2.pngSMB DIrect Intrface 1.png

Regards,

Giovani

Any tips on getting PFC working with my Broadcom 57412 daughter card? I have the same issue with RoCEv1 vs v2, but I cannot figure out why my PFC is false. I am using the latest 21.60 firmware and driver.

*Edit, I found changing the DCBX Mode setting to CEE (Only) allows successful test-rdma.ps1 IsRoCE $True  using v1 without the PFC error that I was receiving when DCBX Mode was set to IEEE (Only) and speed is normal. However, Get-NetAdapterRdma still shows PFC false.

2.JPG

1.JPG

Thanks!

Anyone make any progress on this issue?

I have the Broadcom 57416 and have much worse RDMA performance than non-RDMA. Switching to ROCEv1 did not fix it for me. My PFC also shows false, despite being configured.

Kungfujediss,

 

I would suggest calling in for support as there are multiple things from TSRs, configuration issues, etc that could be causing it. 


Chris Hawk
Social Media and Communities Professional
Dell Technologies | Enterprise Support Services
#Iwork4Dell

Did I answer your query? Please click on ‘Accept as Solution’
‘Kudo’ the posts you like!
Top Contributor
Latest Solutions