Start a Conversation

Unsolved

This post is more than 5 years old

H

1171

March 19th, 2017 22:00

Fabric mismatch with ConnectX-2 Dual-Port VPI mezzanine board

Hello,

I'm getting a fabric mismatch error on my M1000e, I have some Mellanox ConnectX-2 Dual-Port VPI (MCQH29B-XCC, DP/N: X24WC) mezzanine boards. Port one is connected to a Mellanox M4001F for QDR Infiniband, and port two is connected to a Force10 XML for 10GbE. It's my understanding that this card is suppose to be capable of supporting simultaneous QDR Infiniband and 10GbE links.

Moderator

 • 

8.8K Posts

March 20th, 2017 14:00

Hopperkin,

The mismatch is normally due to the mezzanine being present, but not with the correct switch installed, or vice versa. Verify in the CMC if the Fabric of this blade match those of which are already installed, and functioning correctly. There is likely a difference or order issue. 

Let me know what you see.

1 Rookie

 • 

64 Posts

March 21st, 2017 08:00

The fabric error went away after power cycling the M1000e. However, I still haven't been able to get traffic going across the wire. Earlier today I did manage to get the MLNX_OFED package installed on CentOS 7, and was able to reconfigured the port to Ethernet via connectx_port_config. For switches, I tried a Force10 MXL and M8024-k, but ethtool shows that the link is down. I plugged it into the DHCP sever and tried to get a lease and that also failed; I didn't have time yet to explore if the switches were configured properly. Could it be that the ConnectX-2 is talking XAUI? Since it's a Virtual Protocol Interface it would stand to reason that it could speak KR and XAUI. My Intel X520 speaks both XAUI / KR, so supporting both is within the realm of possiblility. The trouble is I can find any documentation online that details what 10GbE spec this card is using.

1 Rookie

 • 

64 Posts

March 21st, 2017 14:00

I had it working about a hour ago, with a M4001F in C1 and a M8024-k in C2, with two ConnectX-2 QDR mezz boards connected to the two C sockets in my M910. It was working great. Then after I confirmed both the Infiniband (port 1) and 10GbE (port 2) were working I added another M4001F in B1 and a MXL in B2 and then two more ConnectX-2 mezz boards in both B sockets, for a total of four ConnectX-2 mezz boards in the M910. I brought the systems back up and now nothing works, not even the C fabric that I had previously working. All the Infiniband ports say link down and they don't come up when I run opensm, and the 10GbE ports say link down and disabled.

1 Rookie

 • 

64 Posts

March 21st, 2017 22:00

I got it working! The performance is abysmal though, what gives?

iperf on ib0 (Port 1 QDR, IPoIB) says 6.24 Gbps.

iperf on enp4s0d1 (Port 2 configured as 10GbE) says 930 Mbps.

I expected IPoIB to have a performance hit, but not by that much! Furthermore, the 10GbE performance is less than 1 Gbps. These are E7-8867L (M910) and X5670 (M710HD) processors, they should be able to keep up. The mezzanine sockets are PCIe 2.0 x8, which would be 32 Gbps. I ran ib_write_bw (RDMA) between two M710HD with X5670 processors and I'm only getting 25 Gbps.

The ConnectX-2 mezzanine boards are running firmware 2.9.1000 and the M4001F switch is running 9.3.6000. All firmware on the M910, M710HD, and M1000e is at the latest version and all systems pass diagnostics.

No Events found!

Top