Unsolved
11 Posts
0
1099
February 5th, 2020 02:00
MD3620f SAS not responding
Hello,
We had a very strange situation with a storage system based on MD3620f and a MD1200 connected on it.
We have installed a second MD1200 to the structure, (connecting it to the first MD1200 that was in the infrastructure) and so far so good. After cabling it the second MD1200 was set as part of Disk Pool but the capacity of the second MD1200 was not shown anywhere. Thinking that we may have a problem with the device that we have newly added, we powered it of. At this moment everything is gone down.
After this move we cant manage the storage from MD Launcher (but we can ping the MNGMT ip's).
Also we can connect via console cable to the storage but everything that we have tried was not working.
What we can see from putty console connection:
Send for Service Interface or baud rate change
02/04/20-11:21:53 (tRAID): SOD Sequence is Normal, 0 on controller A
02/04/20-11:21:54 (tRAID): NOTE: Turning on tray summary fault LED
02/04/20-11:21:54 (tRAID): NOTE: SODRebootLoop- Limit:5 Cnt:2
02/04/20-11:21:54 (tRAID): NOTE: Installed Protocols:
02/04/20-11:21:54 (tRAID): NOTE: Required Protocols:
02/04/20-11:21:54 (tRAID): NOTE: loading flash file: Fibre
02/04/20-11:21:56 (tRAID): NOTE: DSM: Current revision 7
02/04/20-11:21:56 (tRAID): NOTE: SYMBOL: SYMbolAPI registered.
02/04/20-11:21:57 (tRAID): NOTE: RCBBitmapManager total RPA size = 1778384896
02/04/20-11:21:57 (tRAID): NOTE: init: ioc: 0, PLVersion: 11-075-20-00
02/04/20-11:21:58 (tRAID): WARN: MLM: Failed creating m_MelNvsramLock
02/04/20-11:21:58 (tRAID): WARN: MLM: Failed creating m_MelEventListLock
02/04/20-11:21:58 (tRAID): NOTE: CrushMemoryPoolMgr: platform and memory (CPU MemSize:2048), adjusted allocating size to 262144 for CStripes
02/04/20-11:21:59 (tRAID): SOD: Instantiation Phase Complete
02/04/20-11:21:59 (IOSched): NOTE: SAS Expander Added: expDevHandle:x11 enclHandle:x2 numPhys:37 port:2 ioc:0 channel:0 numEntries:22
02/04/20-11:21:59 (IOSched): NOTE: SAS Expander Added: expDevHandle:x12 enclHandle:x3 numPhys:38 port:2 ioc:0 channel:0 numEntries:22
02/04/20-11:21:59 (IOSched): NOTE: SAS Expander Added: expDevHandle:x14 enclHandle:x4 numPhys:38 port:2 ioc:0 channel:0 numEntries:22
02/04/20-11:22:00 (tSasExpChk): NOTE: Local Expander Firmware Version: 00.00.80.10
02/04/20-11:22:00 (IOSched): NOTE: discoveredEncl: trayId:1 slotCount:12 eli:500c04f2a5f39a00
02/04/20-11:22:00 (IOSched): NOTE: discoveredEncl: trayId:2 slotCount:12 eli:500c04f221532b00
02/04/20-11:22:06 (tSasDiscCom): NOTE: SAS: Initial Discovery Complete Time: 34 seconds since last power on/reset, 10 seconds since sas instantiated
02/04/20-11:22:29 (tRAID): WARN: Failed to open Inter-Controller Communication Channels!
02/04/20-11:22:29 (tRAID): NOTE: LockMgr Role is Master
02/04/20-11:22:31 (tHckReset): NOTE: HealthCheck: Alt Ctl: 1 Reset_Failure, state: 0 Start
02/04/20-11:22:59 (tRAID): NOTE: WWN baseName 0002f01f-afd7453d (valid==>SoftRst)
02/04/20-11:23:00 (tRAID): SOD: Pre-Initialization Phase Complete
02/04/20-11:23:00 (utlTimer): NOTE: fcnChannelReport ==> -2 -3 ~4 -5
02/04/20-11:23:05 (utlTimer): NOTE: fcnChannelReport ==> ~2 -3 ~4 -5
02/04/20-11:23:05 (IOSched): WARN: Extended Link Down is over on channel 3 - lasted 6 seconds
02/04/20-11:23:06 (utlTimer): NOTE: fcnChannelReport ==> ~2 ~3 ~4 =5
02/04/20-11:23:30 (utlTimer): NOTE: fcnChannelReport ==> ~2 ~3 4 =5
02/04/20-11:23:35 (utlTimer): NOTE: fcnChannelReport ==> 2 ~3 4 =5
02/04/20-11:23:36 (utlTimer): NOTE: fcnChannelReport ==> 2 3 4 =5
02/04/20-11:23:41 (tRAID): NOTE: ACS: Icon ping to alternate failed: -2, resp: 0
02/04/20-11:23:41 (tRAID): NOTE: ACS: autoCodeSync(): Process start. Comm Mode: 0, Status: 0
02/04/20-11:23:41 (tRAID): WARN: ACS: autoCodeSync(): Skipped since alt not communicating.
02/04/20-11:23:42 (tRAID): SOD: Code Synchronization Initialization Phase Complete
02/04/20-11:23:43 (NvpsPersistentSyncM): NOTE: NVSRAM Persistent Storage updated successfully
02/04/20-11:23:43 (tRAID): NOTE: SAFE: Process new features
02/04/20-11:23:43 (tRAID): NOTE: SAFE: Process legacy features
02/04/20-11:23:45 (tRAID): NOTE: IOManager::restoreData - m_DataSize:0x2800000, m_StartAddress:0x0
02/04/20-11:23:46 (tRAID): NOTE: ncb::IOManager::restoreData - Successful
02/04/20-11:23:46 (tRAID): NOTE: OBB Restore From Flash Completed
02/04/20-11:23:46 (bdbmBGTask): ERROR: sendFreezeStateToAlternate: Exception IconSendInfeasibleException Error
02/04/20-11:23:47 (tRAID): NOTE: SPM acquireObjects exception: IconSendInfeasibleException Error
02/04/20-11:23:47 (tRAID): NOTE: sas: SOD Bad Check-in: IconSendInfeasibleException Error
02/04/20-11:23:47 (tRAID): NOTE: fcn: SOD Bad Check-in: IconSendInfeasibleException Error
02/04/20-11:23:47 (tRAID): NOTE: ion: SOD Bad Check-in: IconSendInfeasibleException Error
02/04/20-11:23:48 (tRAID): WARN: Unable to intialize mirror device
02/04/20-11:23:48 (tRAID): NOTE: CacheMgr::cacheOpenMirrorDevice:: mirror device 0xfffffff
02/04/20-11:23:49 (tRAID): WARN: CCM: initialize() cache reclaim by alt in progress -- Suspending SOD
02/04/20-11:24:00 (utlTimer): WARN: Extended Link Down Timeout on channel 5
02/04/20-11:25:31 (tHckReset): NOTE: HealthCheck: Alt Ctl: 4 Norun_Failure, state: 0 Start
02/04/20-11:25:31 (tHckReset): NOTE: HealthCheckManager: Notify Event 6 Ctl_Not_Running
02/04/20-11:25:31 (tHckReset): NOTE: sas: Peering Disabled (Health Check) Event: 6
02/04/20-11:25:31 (tHckReset): NOTE: fcn: Peering Disabled (Health Check) Event: 6
02/04/20-11:25:31 (cmgrEvent): WARN: Alt Ctl Reboot:
Reboot CompID: 0x407
Reboot reason: 0x6
Reboot reason extra: 0x0
02/04/20-11:25:31 (cmgrEvent): NOTE: holding alt ctl in reset
02/04/20-11:25:32 (cmgrEvent): NOTE: HealthCheck: Alt Ctl: 1 Reset_Failure, state: 0 Start
Any help please?



emullaraj
11 Posts
0
February 5th, 2020 05:00
Hello Comunity,
We have a strange situation with a Dell Infrastructure.
Primary Storage: Dell MD3620f attched first MD1200 via SAS connection (Cascade Connection) and a second MD1200 connected via SAS with first MD1200 storage.
At the moment that we have connected the second MD1200, at MD software we saw the new hdd and a suggestion to add this disk at the same Disk Pool (strange because these disks were different in size and speed).
We have applied yes and waited for 3 days but capacity was not shown on MD software.
Thinking that our new MD1200 will have some problems, we have shutdown the second MD1200 and then BOOM, cluster gone down and we cant connect to the MD3620f anymore via MD software.
We are trying to understand what happen but from the console connection we can only see only the controller that restarts time after time.
Does anyone have any idea?
DellEMCSupport
631 Posts
0
February 5th, 2020 12:00
Hello emullaraj,
You never want to power off your MD3620f while doing an expansion until the expansion has completed. When you powered off your MD3620 & MD1200’s did you leave it powered off for 3 minutes? Looking at the controller output that you posted one controller is booting & the second is stuck in a boot loop. Is your MD3620 still have a support contract?
Please let us know if you have any other questions.
emullaraj
11 Posts
0
February 6th, 2020 02:00
Hello Dell Team,
Unfortunately MD3620f has no support anymore but the new MD1200 is on warranty.
We have left it 3 days from the expansion (5 HDD 6TB each ) but we didnt understand why MD3620f put them at the first Disk pool. Also after 3 days we didnt saw the new virtual volume available.
Yes we left it more then 3 minutes shuted down
Is there any way how to get the second controller out of boot loop?
DellEMCSupport
631 Posts
0
February 6th, 2020 14:00
Hello emullaraj,
Can I get you to gather a support bundle from your working controller? Also, can I get you to capture the boot of your second controller so we can confirm which boot loop it is stuck in? I will send you a private message so that you can send me the logs.
Please let us know if you have any other questions.
emullaraj
11 Posts
0
February 11th, 2020 01:00
Hello,
I need to share with you the support bundle from working controller.
In fact since the matter was urgent the only think that resolved the situation was: sysWipeAllConfigData.
After this the controller was on LockDown Mode and was brought up by the comand: lemClearLockdown.
After this we were able to connect the md software with controller through management port.
So, we lost all data and configuration but at least we have one controller up and running. The firmware is very old also but i guess this is the last think to do.
Now we are stuck at the second controller that it looks like it has been stuck at a boot loop.
sysWipeAllConfigData and lemClearLockdown are not known as command from this controller.
If you please can write i can send the support bundle collection through email.
I need to determine if the second controller should be replaced or not.