[Isilon] Gen6 node split from cluster after reboot

摘要: Gen6 nodes in the cluster are logging events that nodes cannot communicate with the BMC and may show nodes in a RO state. Rebooting the node causes it to split from the cluster.

本文章適用於 本文章不適用於 本文無關於任何特定產品。 本文未識別所有產品版本。

症狀

From the cluster, isi event may be showing the following error:

14.99844  03/14 13:38 C    17   203764         The Baseboard Management Controller (BMC) located in chassis xxxxxxxxxxx, slot 4 is not responding. This controller monitors hardware components such as batteries and power supplies. To ensure these hardware components continue to be monitored, service the BMC as soon as possible.


If a reboot of the node is attempted where there is a serial connection, the following is observed in ePOST output:

Copyright (c) EMC Corporation , 2021
Disk Array Subsystem Controller
Model: Infinity Banshee: Isilon
DiagName: Extended POST
DiagRev: Rev 28.15
Build Date: Mon May 17 22:52:58 2021
BiosRev: 37.41
UEFIFWVolRev: Rev 03.43
FixedSERDESRev: Rev 08.00
BMCMainAppRev: 00.00
BMCSSPRev: 00.00
BMCEMCBBRev: 00.00
StartTime: 03/10/2022 22:10:45
SaSerialNo:


ABCDabcEaFaGHIabJabcKabcdefLabcdefMabcdefNOabPabcQRS
SPI_Buffer_Mgmt::Initialize(): Could not read NVRAM mem persistence struct
TUVWXYZAABBCCabDDabEEabFFabGGHHIIJJKKLLMMNNabOOPPabcQQRRSSTTUUVVWWXXYYZZAAABBBCCCDDD

************************************************************
*                 Extended POST Messages
************************************************************

WARNING:Failed to set fault/status code value:0x01000000 (offset:0x02A8)
INFORMATION:POST Start
WARNING:INIT: IPMI Error Reading Chassis Resume PROM (0x03E0)
WARNING:INIT: IPMI Error Reading SP Resume PROM (0x03E0)
WARNING:Failed to read Boot Options structure from Virtual EEPROM (0x03E0)
WARNING:FRU capability Register in SP resume is not set correctly, VRD components are not recognized
WARNING:Error reading SLIC status sensor or sensor scanning could be disabled for HBA0 card (0x03E0)
WARNING:Error reading SLIC status sensor or sensor scanning could be disabled for HBA1 card (0x03E0)
WARNING:Error reading SLIC status sensor or sensor scanning could be disabled for Disk Interface Card (0x03E0)
WARNING:Unable to read VEEPROM Shared Data Region (0x03E0)
WARNING:Failed to read Boot Options structure from Virtual EEPROM (0x03E0)
WARNING:Failed to set fault/status code value:0x01000002 (offset:0x02A8)
WARNING:Couldn't read Chassis Status
WARNING:Couldn't read sensor or sensor scanning could be disabled (0x90)
WARNING:Couldn't read sensor or sensor scanning could be disabled (0x98)
WARNING:Failed to set fault/status code value:0x01000002 (offset:0x02A8)
WARNING:Couldn't read CMD Sequencer board ID (0x03E0)
WARNING:CMD Sequencer: Failed to read Board ID. No +/-6CMD tolerance applied (BMC code: 0x00)
WARNING:Failed to set fault/status code value:0x01000002 (offset:0x02A8)
WARNING:Failed to set fault/status code value:0x01000002 (offset:0x02A8)
WARNING:Failed to set fault/status code value:0x01000002 (offset:0x02A8)
WARNING:Failed to set fault/status code value:0x01000002 (offset:0x02A8)
WARNING:Failed to set fault/status code value:0x01000025 (offset:0x02A8)
WARNING:Failed to set fault/status code value:0x01000002 (offset:0x02A8)
WARNING:Failed to set fault/status code value:0x01000026 (offset:0x02A8)
WARNING:Failed to set fault/status code value:0x01000027 (offset:0x02A8)
WARNING:Failed to set fault/status code value:0x01000002 (offset:0x02A8)
WARNING:Failed to set fault/status code value:0x01000003 (offset:0x02A8)
WARNING:Failed to set fault/status code value:0x01000002 (offset:0x02A8)
WARNING:Failed to set fault/status code value:0x0100002B (offset:0x02A8)
WARNING:Couldn't read Infinity DIB CMD board ID (0x03E0)
WARNING:Infinity DIB CMD: Failed to read Board ID. No +/-6CMD tolerance applied (BMC code: 0x00)
WARNING:Failed to set fault/status code value:0x01000015 (offset:0x02A8)
WARNING:PSA not present
WARNING:Failed to set fault/status code value:0x01000016 (offset:0x02A8)
WARNING:PSB not present
WARNING:Failed to set fault/status code value:0x01000017 (offset:0x02A8)
WARNING:Failed to set fault/status code value:0x01000002 (offset:0x02A8)
WARNING:Failed to set fault/status code value:0x01000002 (offset:0x02A8)
WARNING:Failed to set fault/status code value:0x01000002 (offset:0x02A8)
WARNING:Failed to set fault/status code value:0x01000002 (offset:0x02A8)
WARNING:Failed to set fault/status code value:0x01000025 (offset:0x02A8)
WARNING:Failed to set fault/status code value:0x01000002 (offset:0x02A8)
WARNING:Failed to set fault/status code value:0x01000002 (offset:0x02A8)
WARNING:Failed to set fault/status code value:0x01000002 (offset:0x02A8)
WARNING:Failed to set fault/status code value:0x01000002 (offset:0x02A8)
WARNING:Failed to set fault/status code value:0x01000025 (offset:0x02A8)
WARNING:Failed to set fault/status code value:0x01000026 (offset:0x02A8)
WARNING:M.2 SATA 0 FW FileSystem Inaccessible (0x00FF)
WARNING:Failed to set fault/status code value:0x01000027 (offset:0x02A8)
WARNING:Firmware Update Skipped Due To PSA Status
WARNING:Failed to set fault/status code value:0x01000002 (offset:0x02A8)
WARNING:Failed to set fault/status code value:0x01000002 (offset:0x02A8)
WARNING:Failed to set fault/status code value:0x01000002 (offset:0x02A8)
WARNING:Failed to set fault/status code value:0x01000002 (offset:0x02A8)
WARNING:Failed to set fault/status code value:0x01000002 (offset:0x02A8)
WARNING:Failed to set fault/status code value:0x01000002 (offset:0x02A8)
WARNING:Failed to set fault/status code value:0x01000002 (offset:0x02A8)
WARNING:Failed to set fault/status code value:0x01000002 (offset:0x02A8)
WARNING:Failed to set fault/status code value:0x01000002 (offset:0x02A8)
WARNING:Failed to set fault/status code value:0x01000002 (offset:0x02A8)
WARNING:Failed to set fault/status code value:0x01000002 (offset:0x02A8)
WARNING:Failed to set fault/status code value:0x0100002B (offset:0x02A8)
WARNING:Failed to set fault/status code value:0x0100002B (offset:0x02A8)
WARNING:Failed to set fault/status code value:0x0100002B (offset:0x02A8)
WARNING:Failed to set fault/status code value:0x01000002 (offset:0x02A8)
WARNING:Failed to set fault/status code value:0x01000015 (offset:0x02A8)
WARNING:Failed to set fault/status code value:0x01000016 (offset:0x02A8)
WARNING:Failed to set fault/status code value:0x01000017 (offset:0x02A8)
************************************************************


EndTime: 03/10/2022 22:10:58


.... Storage System Failure - Contact your Service Representative ...


ErrorCode: 0x000003E0
ErrorDesc:
FRU: Motherboard
Device: BMC
Description: IPMI Transport Protocol Not Found Error!
Rev: 28.15
Send_IPMI_CMD()
Failed to get BMC self test results
Self Test Results*
P/N:
S/N:
EndError:
ErrorTime: 03/10/2022 22:10:45


ErrorCode: 0x000003EA
ErrorDesc:
FRU: Motherboard
Device: NVRAM
Description: NVRAM IPMI Protocol Not Found Error!
Rev: 28.15
EMC NVRAM IPMI Protocol not found!
Initialize NVRAM*
P/N:
S/N:
EndError:
ErrorTime: 03/10/2022 22:10:45


ErrorCode: 0x000001BB
ErrorDesc:
FRU: Motherboard
Device: CMOS Bank 1
Description: BIOS SMI Handler is disabled Error!
Rev: 28.15
CMOS Init*
P/N:
S/N:
EndError:
ErrorTime: 03/10/2022 22:10:45


ErrorCode: 0x000003E0
ErrorDesc:
FRU: Motherboard
Device: LAN Management Port
Description: IPMI Transport Protocol Not Found Error!
Rev: 28.15
Send_IPMI_CMD()
Failed to get/verify BMC MAC address
Verify-Set BMC MAC Addr*
P/N:
S/N:
EndError:
ErrorTime: 03/10/2022 22:10:46


ErrorCode: 0x000003E0
ErrorDesc:
FRU: Motherboard
Device: BIOS ROM
Description: IPMI Transport Protocol Not Found Error!
Rev: 28.15
Send_IPMI_CMD()
Failed to set SSP power reset request(code: 0x01)
EndError:
ErrorTime: 03/10/2022 22:10:57


ErrorCode: 0x000003E0
ErrorDesc:
FRU: Motherboard
Device: M.2 SATA 1
Description: IPMI Transport Protocol Not Found Error!
Rev: 28.15
Send_IPMI_CMD()
Failed to get sensor reading
Sensor ID: 0x90
EndError:
ErrorTime: 03/10/2022 22:10:53



EMC Extended POST End: 03/10/2022 22:11:08


The node continues to boot, however, at the end, the following message appears:
 

Secondary backup method None is unknown
None backup is invalid.
Initializing with a new backup of current data.
Secondary backup method None is unknown
None backup does not exist.
Installation config method is PSF: Pulling PSI config files from backup
Secondary backup method None is unknown
Unable to pull files from None backup
Installation config method is PSF: Unable to pull secondary backup files
Installation config method is PSF: Copying PSI config files
PSF file path /mfg/psi/psf.json does not exist in primary backup

Failed to copy PSI config files: Chassis is missing a PSI receipt.
A receipt must be provided before the boot will be allowed to continue.
To prevent a DL situation, contact Dell EMC Customer Support immediately:
United States: 1 800 782 4362 (1 800 SVC 4EMC)
Canada: 1 800 543 4782 (1 800 543 4SVC)
Worldwide Country Code: 1 508 497 7901


Command Options:
1) Enter recovery shell
2) Continue booting
3) Reboot
option>

 

原因

The BMC on the nodes stopped responding due to a UDP storm on the 1GbE network. 

The issue is specific to Gen6 (A200/A2000/H400/H500/H600/H5600/F800/F810) only. It does not affect PowerScale or PowerScale Hybrid nodes.

解析度

Workaround:
  1. Unplug the 1GbE interface from the affected nodes.
  2. Enter the recovery shell.
  3. Reboot the node.
Fix:
Apply Node Firmware Package (NFP) 11.6 or newer.

受影響的產品

Isilon A200, Isilon A2000, Isilon F800, Isilon F810, Isilon Gen6, Isilon H400, Isilon H500, Isilon H5600, Isilon H600
文章屬性
文章編號: 000213534
文章類型: Solution
上次修改時間: 02 1月 2026
版本:  3
向其他 Dell 使用者尋求您問題的答案
支援服務
檢查您的裝置是否在支援服務的涵蓋範圍內。