Isilon: Gen6 DE peer node pair (H400, A200, A2000) generates events and console messages indicating problems with their NTB link after a compute module replacement or move
Summary: A Gen6 DE peer node pair (H400, A200, A2000) can sometimes start generating error messages indicating problems with their NTB link. Errors can include repeated 'NTB link up/down' messages, and link speed negotiation errors. ...
This article applies to
This article does not apply to
This article is not tied to any specific product.
Not all product versions are identified in this article.
Symptoms
A Gen6 DE (H400, A200, A2000) peer node pair can sometimes start generating error messages indicating problems with their NTB link. Errors can include repeated 'link up/down' messages as well as link speed negotiation errors, for example:
mnv0: HW link down event mnv0: HW link up event mnv0: Failed to negotiate PCIe lane speed; expected 3 lane speed, got 2. mnv0: transport link up mnv0: peer up
Alternatively, the logs and/or console may simply show the NTB link as down and not coming up. There currently appear to be multiple possible causes for this issue, some of which are still under investigation. If one or both affected nodes were recently moved into a different chassis slot, or had their compute module replaced, the issue and resolution documented in this KB may apply. This issue does not affect EP nodes (F800, H600, H500).
Cause
Peer nodes communicate with each other through a special communication channel called the NonTransparent Bridge (NTB) embedded in the chassis backplane. In normal operation, both nodes in a peer pair must have a different PPD value to be able to communicate with each other via the NTB. PPD values are assigned based on the node's slot ID in the chassis. This issue is caused by the node's BIOS retaining the node's old slot ID when a node or compute module is swapped into a different slot than it originally came from, rather than detecting the new slot ID. This may cause the PPD value to be set incorrectly, which causes a conflict that prevents the nodes from establishing an NTB link.
Resolution
Run the following command on both nodes in the affected peer pair to verify whether the issue documented in this KB is applicable:
# sysctl dev.ntb_hw.0.debug_info.ppd
Each node will respond with either:
dev.ntb_hw.0.debug_info.ppd: 73
Or:
dev.ntb_hw.0.debug_info.ppd: 93
- If one node in a pair responds with 73 and the other responds with 93, these nodes are not currently affected by the issue documented in this KB.
Please contact EMC Isilon Technical Support for further assistance.
- If both nodes respond with the same number, whether 73 or 93, they are affected by the issue documented in this KB. This issue is resolved in updated node firmware included in Node Firmware Package 10.1.6 and later, as well as in a code fix included in OneFS 8.1.0.4 and later. While each update individually will remediate the issue, both releases contain other important fixes, so it is recommended to install both.
# sysctl dev.ntb_hw.0.debug_info.ppd
Each node will respond with either:
dev.ntb_hw.0.debug_info.ppd: 73
Or:
dev.ntb_hw.0.debug_info.ppd: 93
- If one node in a pair responds with 73 and the other responds with 93, these nodes are not currently affected by the issue documented in this KB.
Please contact EMC Isilon Technical Support for further assistance.
- If both nodes respond with the same number, whether 73 or 93, they are affected by the issue documented in this KB. This issue is resolved in updated node firmware included in Node Firmware Package 10.1.6 and later, as well as in a code fix included in OneFS 8.1.0.4 and later. While each update individually will remediate the issue, both releases contain other important fixes, so it is recommended to install both.
Affected Products
Isilon, Isilon Gen6Article Properties
Article Number: 000056963
Article Type: Solution
Last Modified: 28 Jun 2023
Version: 6
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.