Unsolved
10 Posts
0
1396
March 18th, 2020 11:00
MD3220 Failed Controller After Power Issues
Long story short I've got an MD3220 with dual cards. After a power issues they went into lock down and I was able to get the controller in slot 0 back online through the serial shell. However, the controller in slot 1 is being shown as removed in MDSM even though the controller is in the unit. I can connect to the slot 1 controller with the serial cable but don't know where to go from there to try and revive it.
While I was waiting for the serial cable to arrive I acquired four used controllers and an enclosure. I'm not sure if I need to wipe those before trying to use them AND I'm confident they have firmware newer than my current controller in slot 0.
Advise?



FRussellTX
10 Posts
0
March 18th, 2020 11:00
I might add that the reason I'm anxious to get the second controller online is that the controller 0 has disabled write caching and I have two hosts accessing this enclosure at the same time. As these as SAS direct connected hosts the performance is abysmal.
DELL-Sam L
Moderator
•
7.7K Posts
0
March 18th, 2020 13:00
Hello FRussellTX,
Are you able to capture the boot of the controller in slot 1 so that we can see why it is not fully booting? To capture the boot, you will need to start your serial connection first then insert the controller into the chassis and allow it to boot.
Please let us know if you have any other questions.
DELL-Sam L
Moderator
•
7.7K Posts
0
March 18th, 2020 15:00
Hello FRussellTX,
Reseating the non-working controller will not hurt your MD3220 as it is not being seen right now anyway. We need to see the boot of the controller to see if it is in lockdown, bootloop, or dead. Also, what is the current version of firmware on your MD3220?
Please let us know if you have any other questions.
FRussellTX
10 Posts
0
March 21st, 2020 15:00
Reset, Power-Up Diagnostics - Loop 1 of 1
3600 Processor DRAM
01 Data lines Passed
02 Address lines Passed
3300 NVSRAM
01 Data lines Passed
4410 Ethernet 82574 1
01 Register read Passed
02 Register address lines Passed
6D40 Bobcat
02 Flash Test Passed
3700 PLB SRAM
01 Data lines Passed
02 Address lines Passed
6D50 LSISAS2008 IOC 1
01 Register Read Test Passed
02 Register Address Lines Test Passed
03 Register Data Lines Test Passed
3900 Real-Time Clock
01 RT Clock Tick Passed
Diagnostic Manager exited normally.
Controller has been locked down due to PCI errors:
================= EXCEPTION LOG =================
Serial number: 39R00J8
Entry count: 12
Wrap-arounds: 0
First entry time:
Current date/time: MAR-21-2020 09:08:18 PM
---- Log Entry #0 DEC-18-2013 05:13:42 PM ----
WARNING: Reset by alternate controller
---- Log Entry #1 DEC-18-2013 05:16:03 PM ----
WARNING: Reset by alternate controller
---- Log Entry #2 DEC-18-2013 05:21:16 PM ----
WARNING: Reset by alternate controller
---- Log Entry #3 DEC-18-2013 09:40:36 PM ----
WARNING: Reset by alternate controller
---- Log Entry #4 DEC-18-2013 09:42:31 PM ----
WARNING: Reset by alternate controller
---- Log Entry #5 JUN-09-2019 10:51:59 AM ----
Root Complex TLP header[0] 34000000
Root Complex TLP header[1] 00000022
Root Complex TLP header[2] 00000000
Root Complex TLP header[3] 00000000
---- Log Entry #6 MAR-16-2020 03:06:56 PM ----
ERROR: PLX NTB Port 0 Bad TLP Count 1572864 exceeds threshold 16
ERROR: PLX NTB Port 0 Bad DLLP Count 1073743408 exceeds threshold 16
ERROR: PLX NTB Port 4 Bad TLP Count 1572864 exceeds threshold 16
ERROR: PLX NTB Port 4 Bad DLLP Count 1073743408 exceeds threshold 16
ERROR: PLX NTB Port 5 Bad TLP Count 1572864 exceeds threshold 16
ERROR: PLX NTB Port 5 Bad DLLP Count 1073743408 exceeds threshold 16
ERROR: PLX NTB Port 6 Bad TLP Count 1572864 exceeds threshold 16
ERROR: PLX NTB Port 6 Bad DLLP Count 1073743408 exceeds threshold 16
ERROR: PLX NTB Port 2/6 Rx Err Count 24 exceeds threshold 16
---- Log Entry #7 MAR-16-2020 03:06:56 PM ----
ERROR: PLX NTB Port 0 ECC correctable error threshold exceeded reg 0xf1a val 0x18
---- Log Entry #8 MAR-16-2020 03:09:16 PM ----
ERROR: PLX NTB Port 0 Bad TLP Count 1572864 exceeds threshold 16
ERROR: PLX NTB Port 0 Bad DLLP Count 1073743408 exceeds threshold 16
ERROR: PLX NTB Port 4 Bad TLP Count 1572864 exceeds threshold 16
ERROR: PLX NTB Port 4 Bad DLLP Count 1073743408 exceeds threshold 16
ERROR: PLX NTB Port 5 Bad TLP Count 1572864 exceeds threshold 16
ERROR: PLX NTB Port 5 Bad DLLP Count 1073743408 exceeds threshold 16
ERROR: PLX NTB Port 6 Bad TLP Count 1572864 exceeds threshold 16
ERROR: PLX NTB Port 6 Bad DLLP Count 1073743408 exceeds threshold 16
ERROR: PLX NTB Port 2/6 Rx Err Count 24 exceeds threshold 16
---- Log Entry #9 MAR-16-2020 03:09:16 PM ----
ERROR: PLX NTB Port 0 ECC correctable error threshold exceeded reg 0xf1a val 0x18
---- Log Entry #10 MAR-16-2020 03:11:11 PM ----
ERROR: PLX NTB Port 0 Bad TLP Count 1572864 exceeds threshold 16
ERROR: PLX NTB Port 0 Bad DLLP Count 1073743408 exceeds threshold 16
ERROR: PLX NTB Port 4 Bad TLP Count 1572864 exceeds threshold 16
ERROR: PLX NTB Port 4 Bad DLLP Count 1073743408 exceeds threshold 16
ERROR: PLX NTB Port 5 Bad TLP Count 1572864 exceeds threshold 16
ERROR: PLX NTB Port 5 Bad DLLP Count 1073743408 exceeds threshold 16
ERROR: PLX NTB Port 6 Bad TLP Count 1572864 exceeds threshold 16
ERROR: PLX NTB Port 6 Bad DLLP Count 1073743408 exceeds threshold 16
ERROR: PLX NTB Port 2/6 Rx Err Count 24 exceeds threshold 16
---- Log Entry #11 MAR-16-2020 03:11:11 PM ----
ERROR: PLX NTB Port 0 ECC correctable error threshold exceeded reg 0xf1a val 0x18
FRussellTX
10 Posts
0
March 21st, 2020 15:00
Firmware version: 07.84.47.60
Appware version: 07.84.47.60
Bootware version: 07.84.47.60
NVSRAM version: N26X0-784890-004
FRussellTX
10 Posts
0
March 21st, 2020 15:00
Here is the firmware information. I'm trying to get the serial logging to post but this forum keeps blocking me.
Firmware version: 07.84.47.60
Appware version: 07.84.47.60
Bootware version: 07.84.47.60
NVSRAM version: N26X0-784890-004
FRussellTX
10 Posts
0
March 21st, 2020 15:00
I can't seem to get any of my replies to get past the SPAM filtering on this forum. I even went back and tried to edit and previous post and add comments there and now that post is missing. How in the world to I get you the information that you're asking for?
I did see the the boot up of the failed controller the following:
Controller has been locked down due to PCI errors
FRussellTX
10 Posts
0
March 21st, 2020 15:00
-=<###>=-
Instantiating /ram as rawFs, device = 0x1
Formatting /ram for DOSFS
Instantiating /ram as rawFs, device = 0x1
Formatting...Retrieved old volume params with %38 confidence:
Volume Parameters: FAT type: FAT32, sectors per cluster 0
0 FAT copies, 0 clusters, 0 sectors per FAT
Sectors reserved 0, hidden 0, FAT sectors 0
Root dir entries 0, sysId (null) , serial number 10000
Label:" " ...
Disk with 1024 sectors of 512 bytes will be formatted with:
Volume Parameters: FAT type: FAT12, sectors per cluster 1
2 FAT copies, 1010 clusters, 3 sectors per FAT
Sectors reserved 1, hidden 0, FAT sectors 6
Root dir entries 112, sysId VXDOS12 , serial number 10000
Label:" " ...
Instantiating /ram as rawFs, device = 0x1
OK.
RTC Error: Real-time clock device is not working
Adding 13888 symbols for standalone.
Length: 0x13c Bytes
Version ver03.0A
Reset, Power-Up Diagnostics - Loop 1 of 1
3600 Processor DRAM
01 Data lines Passed
02 Address lines Passed
3300 NVSRAM
01 Data lines Passed
4410 Ethernet 82574 1
01 Register read Passed
02 Register address lines Passed
6D40 Bobcat
02 Flash Test Passed
3700 PLB SRAM
01 Data lines Passed
02 Address lines Passed
6D50 LSISAS2008 IOC 1
01 Register Read Test Passed
02 Register Address Lines Test Passed
03 Register Data Lines Test Passed
3900 Real-Time Clock
01 RT Clock Tick Passed
Diagnostic Manager exited normally.
Controller has been locked down due to PCI errors:
================= EXCEPTION LOG =================
Serial number: 39R00J8
Entry count: 12
Wrap-arounds: 0
First entry time:
Current date/time: MAR-21-2020 09:08:18 PM
---- Log Entry #0 DEC-18-2013 05:13:42 PM ----
WARNING: Reset by alternate controller
---- Log Entry #1 DEC-18-2013 05:16:03 PM ----
WARNING: Reset by alternate controller
---- Log Entry #2 DEC-18-2013 05:21:16 PM ----
WARNING: Reset by alternate controller
---- Log Entry #3 DEC-18-2013 09:40:36 PM ----
WARNING: Reset by alternate controller
---- Log Entry #4 DEC-18-2013 09:42:31 PM ----
WARNING: Reset by alternate controller
---- Log Entry #5 JUN-09-2019 10:51:59 AM ----
Root Complex TLP header[0] 34000000
Root Complex TLP header[1] 00000022
Root Complex TLP header[2] 00000000
Root Complex TLP header[3] 00000000
---- Log Entry #6 MAR-16-2020 03:06:56 PM ----
ERROR: PLX NTB Port 0 Bad TLP Count 1572864 exceeds threshold 16
ERROR: PLX NTB Port 0 Bad DLLP Count 1073743408 exceeds threshold 16
ERROR: PLX NTB Port 4 Bad TLP Count 1572864 exceeds threshold 16
ERROR: PLX NTB Port 4 Bad DLLP Count 1073743408 exceeds threshold 16
ERROR: PLX NTB Port 5 Bad TLP Count 1572864 exceeds threshold 16
ERROR: PLX NTB Port 5 Bad DLLP Count 1073743408 exceeds threshold 16
ERROR: PLX NTB Port 6 Bad TLP Count 1572864 exceeds threshold 16
ERROR: PLX NTB Port 6 Bad DLLP Count 1073743408 exceeds threshold 16
ERROR: PLX NTB Port 2/6 Rx Err Count 24 exceeds threshold 16
---- Log Entry #7 MAR-16-2020 03:06:56 PM ----
ERROR: PLX NTB Port 0 ECC correctable error threshold exceeded reg 0xf1a val 0x18
---- Log Entry #8 MAR-16-2020 03:09:16 PM ----
ERROR: PLX NTB Port 0 Bad TLP Count 1572864 exceeds threshold 16
ERROR: PLX NTB Port 0 Bad DLLP Count 1073743408 exceeds threshold 16
ERROR: PLX NTB Port 4 Bad TLP Count 1572864 exceeds threshold 16
ERROR: PLX NTB Port 4 Bad DLLP Count 1073743408 exceeds threshold 16
ERROR: PLX NTB Port 5 Bad TLP Count 1572864 exceeds threshold 16
ERROR: PLX NTB Port 5 Bad DLLP Count 1073743408 exceeds threshold 16
ERROR: PLX NTB Port 6 Bad TLP Count 1572864 exceeds threshold 16
ERROR: PLX NTB Port 6 Bad DLLP Count 1073743408 exceeds threshold 16
ERROR: PLX NTB Port 2/6 Rx Err Count 24 exceeds threshold 16
---- Log Entry #9 MAR-16-2020 03:09:16 PM ----
ERROR: PLX NTB Port 0 ECC correctable error threshold exceeded reg 0xf1a val 0x18
---- Log Entry #10 MAR-16-2020 03:11:11 PM ----
ERROR: PLX NTB Port 0 Bad TLP Count 1572864 exceeds threshold 16
ERROR: PLX NTB Port 0 Bad DLLP Count 1073743408 exceeds threshold 16
ERROR: PLX NTB Port 4 Bad TLP Count 1572864 exceeds threshold 16
ERROR: PLX NTB Port 4 Bad DLLP Count 1073743408 exceeds threshold 16
ERROR: PLX NTB Port 5 Bad TLP Count 1572864 exceeds threshold 16
ERROR: PLX NTB Port 5 Bad DLLP Count 1073743408 exceeds threshold 16
ERROR: PLX NTB Port 6 Bad TLP Count 1572864 exceeds threshold 16
ERROR: PLX NTB Port 6 Bad DLLP Count 1073743408 exceeds threshold 16
ERROR: PLX NTB Port 2/6 Rx Err Count 24 exceeds threshold 16
---- Log Entry #11 MAR-16-2020 03:11:11 PM ----
ERROR: PLX NTB Port 0 ECC correctable error threshold exceeded reg 0xf1a val 0x18
DELL-Sam L
Moderator
•
7.7K Posts
0
March 23rd, 2020 16:00
Hello FRussellTX,
Looking at your serial capture you are in lockdown due to PCI errors. What you are going to need to do is to clear the lockdown then upgrade your firmware. Here it the command to clear the lockdown
clearHardwareLockdown
You need to upgrade your firmware on your MD3220 so that you can stop receiving the error. This issue is resolved in newer firmware.
Please let us know if you have any other questions.
FRussellTX
10 Posts
0
March 24th, 2020 18:00
That did the trick! Thanks for the help on this one. This MD3220 and the two R720s on top of it worked really well for the last 6 years - but I'll be moving on to new gear in the next couple of weeks.