Unsolved
This post is more than 5 years old
1 Message
0
43030
March 8th, 2016 18:00
MD3000i problems with both controllers after power failure
After a power failure, our Out-of-Warranty MD3000i is not working properly. I think that both controllers are having problems. I can ping both management IPs, and vmkping at least 1 or the 4 iSCSI IPs. If I connect with only 1 ESXI server, I am able to connect the the lun to get the data off.... but very slowly. eg 16GB took 4 days.
2 weeks before power outage, I changed the MD3000i iSCSI ports to use jumbo frames. This was working properly until the power outage.
I can get to the MDSM, but I am not able to reach iSCSI > Configure iSCSI Host Ports. I can click section, but it will not launch the iSCSI Host Ports configuration page. I want to try and disable jumbo frames.
Questions:
Is there a way to just use 1 controller until I get a replacement?
What is the problem with controller 0? , Can it be fixed? , Is it related to the jumbo frames setting, and can it be disabled from serial?
#--- Controller 0 Serial Output BEGIN -------------------------------------------------
-=<###>=-
Attaching interface lo0... done
Adding 9768 symbols for standalone.
Error
03/06/16-01:14:24 (GMT) (tRootTask): NOTE: I2C transaction returned 0x0423fe00
WARNING: Reset by alternate controller
Current date: 03/05/16 time: 16:29:45
Send for Service Interface or baud rate change
03/06/16-01:14:41 (GMT) (tRAID): NOTE: SOD Sequence is Normal, 0
03/06/16-01:14:41 (GMT) (tRAID): NOTE: SOD: removed SAS host from index 0
03/06/16-01:14:41 (GMT) (tRAID): NOTE: In iscsiIOQLIscsiInitDq. iscsiIoFstrBase = 0x0
03/06/16-01:14:41 (GMT) (tRAID): NOTE: Turning on tray summary fault LED
03/06/16-01:14:42 (GMT) (tNetCfgInit): NOTE: Network Ready
esmc0: Link change detected, LinkDown may take a long time to detect
03/06/16-01:14:43 (GMT) (tRAID): NOTE: SYMBOL: SYMbolAPI registered.
0x36d600 (tNetTask): esmc0: LinkUp event
03/06/16-01:14:46 (GMT) (tRAID): NOTE: Initiating Drive channel: ioc:0 bringup
03/06/16-01:14:49 (GMT) (tRAID): NOTE: IOC Firmware Version: 00-24-63-00
03/06/16-01:14:58 (GMT) (tSasEvtWkr): NOTE: sasIocPhyUp: chan:0 phy:0 prevNumActivePhys:2 numActivePhys:2
03/06/16-01:14:58 (GMT) (tSasEvtWkr): NOTE: sasIocPhyUp: chan:0 phy:1 prevNumActivePhys:2 numActivePhys:2
03/06/16-01:14:59 (GMT) (tSasEvtWkr): NOTE: sasIocPhyUp: chan:1 phy:2 prevNumActivePhys:2 numActivePhys:2
03/06/16-01:14:59 (GMT) (tSasEvtWkr): NOTE: sasIocPhyUp: chan:1 phy:3 prevNumActivePhys:2 numActivePhys:2
03/06/16-01:14:59 (GMT) (tSasCfg014): NOTE: Alt Controller path up - chan:0 phy:18 itn:1
03/06/16-01:14:59 (GMT) (tSasCfg021): NOTE: Alt Controller path up - chan:1 phy:16 itn:2
03/06/16-01:15:08 (GMT) (tRAID): NOTE: IonMgr: Drive Interface Enabled
03/06/16-01:15:09 (GMT) (tRAID): NOTE: SOD: Instantiation Phase Complete
03/06/16-01:15:09 (GMT) (tRAID): NOTE: Inter-Controller Communication Channels Opened
03/06/16-01:15:09 (GMT) (tSasDiscCom): NOTE: SAS Discovery complete task spawned
03/06/16-01:15:09 (GMT) (IOSched): NOTE: New Initiator: 1 - channel: 1,devHandle: x27, SAS Address: 5848f694f6e72a00
03/06/16-01:15:09 (GMT) (tRAID): NOTE: LockMgr Role is Slave
03/06/16-01:15:09 (GMT) (sasCheckExpanderSet): NOTE: Expander Firmware Version: 0116-e05c
03/06/16-01:15:09 (GMT) (sasCheckExpanderSet): NOTE: Expander SAS address: Hi = x50026b94 Low = x614f0710
03/06/16-01:15:09 (GMT) (tRAID): NOTE: spmEarlyData: Using cached data
03/06/16-01:15:14 (GMT) (tSasDiscCom): WARN: SAS: Initial Discovery Complete Time: 30 seconds
03/06/16-01:15:14 (GMT) (tRAID): NOTE: WWN baseName 00040026-b9614f07 (valid==>SigMatch)
03/06/16-01:15:14 (GMT) (tRAID): NOTE: ionEnableHostInterfaces is waiting for a channel to become ready
03/06/16-01:15:14 (GMT) (tRAID): NOTE: ionEnableHostInterfaces waited 800ms for a channel to become ready
03/06/16-01:15:14 (GMT) (tRAID): NOTE: IonMgr: Host Interface Enabled
03/06/16-01:15:14 (GMT) (tRAID): NOTE: SOD: Pre-Initialization Phase Complete
03/06/16-01:15:27 (GMT) (tRAID): NOTE: ACS: autoCodeSync(): Process start. Comm Mode: 0, Status: 1
03/06/16-01:15:28 (GMT) (tRAID): NOTE: SOD: Code Synchronization Initialization Phase Complete
03/06/16-01:15:29 (GMT) (NvpsPersistentSyncM): NOTE: NVSRAM Persistent Storage updated successfully
03/06/16-01:15:29 (GMT) (tRAID): NOTE: USM Mgr initialization complete with 0 records.
03/06/16-01:15:29 (GMT) (tRAID): NOTE: EDR - recieved 1 small records
03/06/16-01:15:29 (GMT) (tRAID): NOTE: EDR - recieved 0 large records
03/06/16-01:15:30 (GMT) (tRAID): NOTE: Acquire 0.024 secs
03/06/16-01:15:31 (GMT) (tRAID): NOTE: QLStartFw: Downloading Driver's FW image 03.00.01.47 from 031fb740 4c0c8 bytes , result 0
03/06/16-01:15:58 (GMT) (tRAID): WARN: QLMailboxCommand: Cmd = 0069, completion timeout
03/06/16-01:15:58 (GMT) (tRAID): WARN: QLMailboxCommand: command completion timeout, cmd = 0x69
03/06/16-01:15:59 (GMT) (tRAID): NOTE: Qlogic coredump file written to '9J6GTL1:/tmp/QLogic_Coredump_port_0_9J6GTL1',rc 204E50, expected 204E50
03/06/16-01:15:59 (GMT) (tRAID): WARN: Qlogic coredump file write failed.fclose returned -1
03/06/16-01:15:59 (GMT) (tRAID): NOTE: QLProcessSystemError: Restart RISC
03/06/16-01:15:59 (GMT) (tRAID): ERROR: QLGetFwState: MBOX_CMD_GET_FW_STATE failed. Stat f000
03/06/16-01:15:59 (GMT) (tRAID): NOTE: QLRebootTimer: Status after Get FW State 4543
03/06/16-01:15:59 (GMT) (tRAID): NOTE: QLRebootTimer: QLGetFwState failed
03/06/16-01:16:01 (GMT) (tRAID): NOTE: QLStartFw: Downloading Driver's FW image 03.00.01.47 from 031fb740 4c0c8 bytes , result 0
03/06/16-01:16:28 (GMT) (tRAID): WARN: QLMailboxCommand: Cmd = 0069, completion timeout
03/06/16-01:16:28 (GMT) (tRAID): WARN: QLMailboxCommand: command completion timeout, cmd = 0x69
03/06/16-01:16:29 (GMT) (tRAID): NOTE: Qlogic coredump file written to '9J6GTL1:/tmp/QLogic_Coredump_port_0_9J6GTL1',rc 204E50, expected 204E50
03/06/16-01:16:29 (GMT) (tRAID): WARN: Qlogic coredump file write failed.fclose returned -1
03/06/16-01:16:29 (GMT) (tRAID): NOTE: QLProcessSystemError: Restart RISC
03/06/16-01:16:29 (GMT) (tRAID): ERROR: QLGetFwState: MBOX_CMD_GET_FW_STATE failed. Stat f000
03/06/16-01:16:29 (GMT) (tRAID): NOTE: QLRebootTimer: Status after Get FW State 4543
03/06/16-01:16:29 (GMT) (tRAID): NOTE: QLRebootTimer: QLGetFwState failed
03/06/16-01:16:30 (GMT) (tRAID): NOTE: QLStartFw: Downloading Driver's FW image 03.00.01.47 from 031fb740 4c0c8 bytes , result 0
03/06/16-01:16:57 (GMT) (tRAID): WARN: QLMailboxCommand: Cmd = 0069, completion timeout
03/06/16-01:16:57 (GMT) (tRAID): WARN: QLMailboxCommand: command completion timeout, cmd = 0x69
03/06/16-01:16:58 (GMT) (tRAID): NOTE: Qlogic coredump file written to '9J6GTL1:/tmp/QLogic_Coredump_port_0_9J6GTL1',rc 204E50, expected 204E50
03/06/16-01:16:58 (GMT) (tRAID): WARN: Qlogic coredump file write failed.fclose returned -1
03/06/16-01:16:58 (GMT) (tRAID): NOTE: QLProcessSystemError: Restart RISC
03/06/16-01:16:58 (GMT) (tRAID): ERROR: QLGetFwState: MBOX_CMD_GET_FW_STATE failed. Stat f000
03/06/16-01:16:58 (GMT) (tRAID): NOTE: QLRebootTimer: Status after Get FW State 4543
03/06/16-01:16:58 (GMT) (tRAID): NOTE: QLRebootTimer: QLGetFwState failed
03/06/16-01:16:59 (GMT) (tRAID): NOTE: QLStartFw: Downloading Driver's FW image 03.00.01.47 from 031fb740 4c0c8 bytes , result 0
03/06/16-01:17:26 (GMT) (tRAID): WARN: QLMailboxCommand: Cmd = 0069, completion timeout
03/06/16-01:17:26 (GMT) (tRAID): WARN: QLMailboxCommand: command completion timeout, cmd = 0x69
03/06/16-01:17:27 (GMT) (tRAID): NOTE: Qlogic coredump file written to '9J6GTL1:/tmp/QLogic_Coredump_port_0_9J6GTL1',rc 204E50, expected 204E50
03/06/16-01:17:27 (GMT) (tRAID): WARN: Qlogic coredump file write failed.fclose returned -1
03/06/16-01:17:27 (GMT) (tRAID): NOTE: QLProcessSystemError: Restart RISC
03/06/16-01:17:27 (GMT) (tRAID): ERROR: QLGetFwState: MBOX_CMD_GET_FW_STATE failed. Stat f000
03/06/16-01:17:27 (GMT) (tRAID): NOTE: QLRebootTimer: Status after Get FW State 4543
03/06/16-01:17:27 (GMT) (tRAID): NOTE: QLRebootTimer: QLGetFwState failed
03/06/16-01:17:28 (GMT) (tRAID): NOTE: QLStartFw: Downloading Driver's FW image 03.00.01.47 from 031fb740 4c0c8 bytes , result 0
03/06/16-01:17:55 (GMT) (tRAID): WARN: QLMailboxCommand: Cmd = 0069, completion timeout
03/06/16-01:17:55 (GMT) (tRAID): WARN: QLMailboxCommand: command completion timeout, cmd = 0x69
03/06/16-01:17:56 (GMT) (tRAID): NOTE: Qlogic coredump file written to '9J6GTL1:/tmp/QLogic_Coredump_port_0_9J6GTL1',rc 204E50, expected 204E50
03/06/16-01:17:56 (GMT) (tRAID): WARN: Qlogic coredump file write failed.fclose returned -1
03/06/16-01:17:56 (GMT) (tRAID): NOTE: QLProcessSystemError: Restart RISC
03/06/16-01:17:56 (GMT) (tRAID): ERROR: QLGetFwState: MBOX_CMD_GET_FW_STATE failed. Stat f000
03/06/16-01:17:56 (GMT) (tRAID): NOTE: QLRebootTimer: Status after Get FW State 4543
03/06/16-01:17:56 (GMT) (tRAID): NOTE: QLRebootTimer: QLGetFwState failed
03/06/16-01:17:57 (GMT) (tRAID): WARN: QLStartAdapter: ControllerErrorCount exceeds threshold.
03/06/16-01:17:57 (GMT) (tRAID): ERROR: QLInitializeDevice: QLStartAdapter failed
03/06/16-01:17:57 (GMT) (tRAID): ERROR: QLAddDevice: controller/device/chip initialization failed.
03/06/16-01:17:57 (GMT) (tRAID): ERROR: qlgEnableHostInterface: QLInitializeDevice failed.
03/06/16-01:17:57 (GMT) (tRAID): NOTE: ********************************************************************************
03/06/16-01:17:57 (GMT) (tRAID): NOTE: QLogic Target Applicat
-=<###>=-
#--- Controller 0 Serial Output END-------------------------------------------------
Controller 1 Serial Output has been attached as a file. It seems to show that it has the dreaded Memory parity error.


DELL-Sam L
Moderator
•
7.9K Posts
•
120 Points
0
March 15th, 2016 10:00
Hello labatman,
So while looking at your Serial capture that you posted the controller0 that you received the output from has failed & needs to be replaced. When you see the following error the controller will need to be replaced as it can’t recover from a fatal error.
03/06/16-01:15:59 (GMT) (tRAID): NOTE: QLProcessSystemError: Restart RISC
03/06/16-01:15:59 (GMT) (tRAID): ERROR: QLGetFwState: MBOX_CMD_GET_FW_STATE failed. Stat f000
03/06/16-01:15:59 (GMT) (tRAID): NOTE: QLRebootTimer: Status after Get FW State 4543
03/06/16-01:15:59 (GMT) (tRAID): NOTE: QLRebootTimer: QLGetFwState failed
03/06/16-01:16:01 (GMT) (tRAID): NOTE: QLStartFw: Downloading Driver's FW image 03.00.01.47 from 031fb740 4c0c8 bytes , result 0
03/06/16-01:16:28 (GMT) (tRAID): WARN: QLMailboxCommand: Cmd = 0069, completion timeout
03/06/16-01:16:28 (GMT) (tRAID): WARN: QLMailboxCommand: command completion timeout, cmd =
Now looking at controller1 I do see the Ram error. Now that error can mean either the controller has failed or it is an issue with the backplane of your MD3000i.
So what I would do is to get another controller and try it in slot 0 and see if it boots normally. If it doesn’t then try the replacement controller in slot 1 & see if it boots normally. If the replacement controller still fails to boot then will need to replace the backplane of your MD3000i.
Please let us know if you have any other questions.