Unsolved
1 Rookie
•
4 Posts
0
2299
July 28th, 2020 05:00
MD3420 failed controllers?
Hi there,
I have an MD3420 that is no longer starting up. The unit has 2 controllers, both of which recently required a new RTC battery be installed which was done.
After replacing the battery both units will not boot. The messages I am seeing via the serial console are different per controller. I cannot seem to continue or abort or etc, the controllers just don't seem to respond to key strokes. Not sure if that is an issue with the cable or not.
For controller 0 I am seeing:
-=<###>=- reading fw/active/System
8276978 bytes read
Adding 18180 symbols for standalone.
wrproxy:Instantiating /bd0 as rawFs, device = 0x10001
RTC Error: Real-time clock device is not working
RTC Warning: Battery for real-time clock and NVSRAM must be replaced
/bd16/fw/staged/: errno = 0x380003
usrSecurity: error adding default user '(null)' to the system (errno = 0x16)!
Instantiating /ram as rawFs, device = 0x80001
Formatting /ram for DOSFS
Instantiating /ram as rawFs, device = 0x80001
Formatting...Retrieved old volume params with %38 confidence:
Volume Parameters: FAT type: FAT32, sectors per cluster 0
0 FAT copies, 0 clusters, 0 sectors per FAT
Sectors reserved 0, hidden 0, FAT sectors 0
Root dir entries 0, sysId (null) , serial number 1c60000
Label:" " ...
Disk with 1024 sectors of 512 bytes will be formatted with:
Volume Parameters: FAT type: FAT12, sectors per cluster 1
2 FAT copies, 1010 clusters, 3 sectors per FAT
Sectors reserved 1, hidden 0, FAT sectors 6
Root dir entries 112, sysId VXDOS12 , serial number 1c60000
Label:" " ...
OK.
Reset, Power-Up Diagnostics - Loop 1 of 1
3600 Processor DRAM
01 Data lines Passed
02 Address lines Passed
3300 NVSRAM
01 Data lines Passed
3900 Real-Time Clock
01 RT Clock Tick ** Failed **
*** Error (0005): Real time clock/NVSRAM battery low
STATUS: Real time clock/NVSRAM battery low
*** Current Loop Count: 1 Date: 03/31/2014 Time: 05:36:39
*** Master Error Code: 2701-3900-01-0005
D)isplay R)etry C)ontinue M)onitor Shell A)bort ?
And for Controller 1 I am seeing:
-=<###>=- reading fw/active/System
8276978 bytes read
Adding 18180 symbols for standalone.
wrproxy:Instantiating /bd0 as rawFs, device = 0x10001
RTC Error: Real-time clock device is not working
RTC Warning: Battery for real-time clock and NVSRAM must be replaced
/bd16/fw/staged/: errno = 0x380003
Unexpected PCI bus width 0(8) for BDF[1,8,0]
Unexpected PCI bus speed 1(3) for BDF[1,8,0]
TRIGGERING LOCKDOWN.
PCI DEVICE 5 NOT FOUND: Host Card PLX PCI-E Switch Device=0 Function=0 ClassCode=0x60400 ParentBridge=PLX PCI-E Bridge to Host Card
TRIGGERING LOCKDOWN.
PCI DEVICE NOT ACCESSIBLE: Host Card PLX PCI-E Switch Device=0 Function=0 ClassCode=0x60400 ParentBridge=PLX PCI-E Bridge to Host Card
PCI DEVICE 0 NOT FOUND: Host Card PLX PCI-E Switch Device=0 Function=0 ClassCode=0x0 ParentBridge=PLX PCI-E Bridge to Host Card
TRIGGERING LOCKDOWN.
PCI DEVICE NOT ACCESSIBLE: Host Card PLX PCI-E Switch Device=0 Function=0 ClassCode=0x0 ParentBridge=PLX PCI-E Bridge to Host Card
PCI DEVICE NOT ACCESSIBLE: PLX PCI-E Bridge to Host Card SAS Host Port 0 Device=9 Function=0 ClassCode=0x60400 ParentBridge=Host Card PLX PCI-E Switch
PCI DEVICE 0 NOT FOUND: Host Card PLX PCI-E Switch Device=0 Function=0 ClassCode=0x0 ParentBridge=PLX PCI-E Bridge to Host Card
TRIGGERING LOCKDOWN.
PCI DEVICE NOT ACCESSIBLE: Host Card PLX PCI-E Switch Device=0 Function=0 ClassCode=0x0 ParentBridge=PLX PCI-E Bridge to Host Card
PCI DEVICE NOT ACCESSIBLE: PLX PCI-E Bridge to Host Card SAS Host Port 0 Device=9 Function=0 ClassCode=0x0 ParentBridge=Host Card PLX PCI-E Switch
PCI DEVICE NOT ACCESSIBLE: Host-side SAS Device=0 Function=0 ClassCode=0x10700 ParentBridge=PLX PCI-E Bridge to Host Card SAS Host Port 0
PCI DEVICE 0 NOT FOUND: Host Card PLX PCI-E Switch Device=0 Function=0 ClassCode=0x0 ParentBridge=PLX PCI-E Bridge to Host Card
TRIGGERING LOCKDOWN.
PCI DEVICE NOT ACCESSIBLE: Host Card PLX PCI-E Switch Device=0 Function=0 ClassCode=0x0 ParentBridge=PLX PCI-E Bridge to Host Card
PCI DEVICE NOT ACCESSIBLE: PLX PCI-E Bridge to Host Card SAS Host Port 0 Device=9 Function=0 ClassCode=0x0 ParentBridge=Host Card PLX PCI-E Switch
PCI DEVICE NOT ACCESSIBLE: Host-side SAS Device=0 Function=0 ClassCode=0x10700 ParentBridge=PLX PCI-E Bridge to Host Card SAS Host Port 0
03/07/14-02:05:05 (tSystem): NOTE: I2C transaction returned 0x8011e000
Error
usrSecurity: error adding default user '(null)' to the system (errno = 0x16)!
Instantiating /ram as rawFs, device = 0x50001
Formatting /ram for DOSFS
Instantiating /ram as rawFs, device = 0x50001
Formatting...Retrieved old volume params with %38 confidence:
Volume Parameters: FAT type: FAT32, sectors per cluster 0
0 FAT copies, 0 clusters, 0 sectors per FAT
Sectors reserved 0, hidden 0, FAT sectors 0
Root dir entries 0, sysId (null) , serial number 1ca0000
Label:" " ...
Disk with 1024 sectors of 512 bytes will be formatted with:
Volume Parameters: FAT type: FAT12, sectors per cluster 1
2 FAT copies, 1010 clusters, 3 sectors per FAT
Sectors reserved 1, hidden 0, FAT sectors 6
Root dir entries 112, sysId VXDOS12 , serial number 1ca0000
Label:" " ...
OK.
PLX device not found. Was expecting to find Device ID 8724
Could not determine ntbBridgeBase. Bailing out.
Reset, Power-Up Diagnostics - Loop 1 of 1
3600 Processor DRAM
01 Data lines Passed
02 Address lines Passed
3300 NVSRAM
01 Data lines Passed
3900 Real-Time Clock
01 RT Clock Tick ** Failed **
*** Error (0005): Real time clock/NVSRAM battery low
STATUS: Real time clock/NVSRAM battery low
*** Current Loop Count: 1 Date: 03/07/2014 Time: 02:05:07
*** Master Error Code: 2701-3900-01-0005
D)isplay R)etry C)ontinue M)onitor Shell A)bort ?
Does this mean that both controllers are now unusable? I also noticed that the fans will continuously run at full speed until I remove Controller 1.



DellEMCSupport
631 Posts
0
July 28th, 2020 13:00
Hello brettfk,
When you replaced the controller batteries did you replace them one at a time or both at the same time? If you boot your MD3420 with only one controller in slot 0 will the controller boot? Can you capture the full boot process of the controller in slot 0?
brettfk
1 Rookie
•
4 Posts
0
July 28th, 2020 15:00
Hi there,
I did them one at a time. Probably worth noting I suspected Controller 1 was suffering from hardware failure before replacing the battery anyway. For what it's worth I replaced the battery in Controller 1 before Controller 0.
The output that I pasted in to my original is the full boot process that I am seeing, I didn't leave anything out.
Also, I have tried to boot the system with only Controller 0 inserted, but am getting the same result regardless.
Thanks
DellEMCSupport
631 Posts
0
July 29th, 2020 08:00
Hello brettfk,
You don’t happen to have a recent support bundle from your MD3420?
brettfk
1 Rookie
•
4 Posts
0
July 29th, 2020 18:00
I do not, I think the last support bundle was collected back in January and I'm not sure if I still have that one, however I will check to confirm later today.
DellEMCSupport
631 Posts
0
August 6th, 2020 10:00
Hello brettfk,
Sorry for the delay. I check with a coworker and what it is looking like is that your controllers might be in a lockdown state. Normally when it comes to RTC battery we will replace the controller as there are other issues that can happen. It is best that if your MD3420 is still under warranty that you call into support so that we can clear the lockdown and see what else is going on with the controllers.
brettfk
1 Rookie
•
4 Posts
0
August 10th, 2020 04:00
Thanks. Unfortunately the unit is well out of warranty - is there any other way to clear the lockdown mode?
DellEMCSupport
631 Posts
0
August 10th, 2020 12:00
Hello,
https://dell.to/3irYQd9
I looked through the manual and found this entry on page 37. Based on the description, I don't believe you'll be able to get out of the lockdown state without hardware replacement.
"Controller Failure Conditions
Certain events can cause a RAID controller module to fail and/or shut down. Unrecoverable ECC
memory or PCI errors, or critical physical conditions can cause lockdown. If your RAID storage array is
configured for redundant access and cache mirroring, the surviving controller can normally recover
without data loss or shutdown."