Isilon: Gen6 DE peer node pair (H400, A200, A2000) generates events and console messages indicating problems with their NTB link after a compute module replacement or move

Summary: A Gen6 DE peer node pair (H400, A200, A2000) can sometimes start generating error messages indicating problems with their NTB link. Errors can include repeated 'NTB link up/down' messages, and link speed negotiation errors. ...

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Symptoms

A Gen6 DE (H400, A200, A2000) peer node pair can sometimes start generating error messages indicating problems with their NTB link. Errors can include repeated 'link up/down' messages as well as link speed negotiation errors, for example:
 
mnv0: HW link down event
mnv0: HW link up event
mnv0: Failed to negotiate PCIe lane speed; expected 3 lane speed, got 2.
mnv0: transport link up
mnv0: peer up


Alternatively, the logs and/or console may simply show the NTB link as down and not coming up. There currently appear to be multiple possible causes for this issue, some of which are still under investigation. If one or both affected nodes were recently moved into a different chassis slot, or had their compute module replaced, the issue and resolution documented in this KB may apply. This issue does not affect EP nodes (F800, H600, H500).  

Cause

Peer nodes communicate with each other through a special communication channel called the NonTransparent Bridge (NTB) embedded in the chassis backplane. In normal operation, both nodes in a peer pair must have a different PPD value to be able to communicate with each other via the NTB. PPD values are assigned based on the node's slot ID in the chassis. This issue is caused by the node's BIOS retaining the node's old slot ID when a node or compute module is swapped into a different slot than it originally came from, rather than detecting the new slot ID. This may cause the PPD value to be set incorrectly, which causes a conflict that prevents the nodes from establishing an NTB link.

Resolution

Run the following command on both nodes in the affected peer pair to verify whether the issue documented in this KB is applicable:

# sysctl dev.ntb_hw.0.debug_info.ppd

Each node will respond with either:

dev.ntb_hw.0.debug_info.ppd: 73
Or:
dev.ntb_hw.0.debug_info.ppd: 93

- If one node in a pair responds with 73 and the other responds with 93, these nodes are not currently affected by the issue documented in this KB.
Please contact EMC Isilon Technical Support for further assistance.

- If both nodes respond with the same number, whether 73 or 93, they are affected by the issue documented in this KB. This issue is resolved in updated node firmware included in Node Firmware Package 10.1.6 and later, as well as in a code fix included in OneFS 8.1.0.4 and later. While each update individually will remediate the issue, both releases contain other important fixes, so it is recommended to install both.

Affected Products

Isilon, Isilon Gen6
Article Properties
Article Number: 000056963
Article Type: Solution
Last Modified: 28 Jun 2023
Version:  6
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.