Isilon: Gen6 DE peer node pair (H400, A200, A2000) generates events and console messages indicating problems with their NTB link after a compute module replacement or move

Zusammenfassung: A Gen6 DE peer node pair (H400, A200, A2000) can sometimes start generating error messages indicating problems with their NTB link. Errors can include repeated 'NTB link up/down' messages, and link speed negotiation errors. ...

Dieser Artikel gilt für Dieser Artikel gilt nicht für Dieser Artikel ist nicht an ein bestimmtes Produkt gebunden. In diesem Artikel werden nicht alle Produktversionen aufgeführt.

Symptome

A Gen6 DE (H400, A200, A2000) peer node pair can sometimes start generating error messages indicating problems with their NTB link. Errors can include repeated 'link up/down' messages as well as link speed negotiation errors, for example:
 
mnv0: HW link down event
mnv0: HW link up event
mnv0: Failed to negotiate PCIe lane speed; expected 3 lane speed, got 2.
mnv0: transport link up
mnv0: peer up


Alternatively, the logs and/or console may simply show the NTB link as down and not coming up. There currently appear to be multiple possible causes for this issue, some of which are still under investigation. If one or both affected nodes were recently moved into a different chassis slot, or had their compute module replaced, the issue and resolution documented in this KB may apply. This issue does not affect EP nodes (F800, H600, H500).  

Ursache

Peer nodes communicate with each other through a special communication channel called the NonTransparent Bridge (NTB) embedded in the chassis backplane. In normal operation, both nodes in a peer pair must have a different PPD value to be able to communicate with each other via the NTB. PPD values are assigned based on the node's slot ID in the chassis. This issue is caused by the node's BIOS retaining the node's old slot ID when a node or compute module is swapped into a different slot than it originally came from, rather than detecting the new slot ID. This may cause the PPD value to be set incorrectly, which causes a conflict that prevents the nodes from establishing an NTB link.

Lösung

Run the following command on both nodes in the affected peer pair to verify whether the issue documented in this KB is applicable:

# sysctl dev.ntb_hw.0.debug_info.ppd

Each node will respond with either:

dev.ntb_hw.0.debug_info.ppd: 73
Or:
dev.ntb_hw.0.debug_info.ppd: 93

- If one node in a pair responds with 73 and the other responds with 93, these nodes are not currently affected by the issue documented in this KB.
Please contact EMC Isilon Technical Support for further assistance.

- If both nodes respond with the same number, whether 73 or 93, they are affected by the issue documented in this KB. This issue is resolved in updated node firmware included in Node Firmware Package 10.1.6 and later, as well as in a code fix included in OneFS 8.1.0.4 and later. While each update individually will remediate the issue, both releases contain other important fixes, so it is recommended to install both.

Betroffene Produkte

Isilon, Isilon Gen6
Artikeleigenschaften
Artikelnummer: 000056963
Artikeltyp: Solution
Zuletzt geändert: 28 Juni 2023
Version:  6
Antworten auf Ihre Fragen erhalten Sie von anderen Dell NutzerInnen
Support Services
Prüfen Sie, ob Ihr Gerät durch Support Services abgedeckt ist.