未解决
此帖子已超过 5 年
196 消息
0
2639
江湖救急:CX300硬盘故障问题,连换三块都无法识别到硬盘
请教各位一个很简单的问题哦,一台很老的CX300,远程管理 发现一个硬盘故障:
从这里看,应该是0-1-4硬盘故障了,原hot spare 硬盘是0-1-14,现在应该顶替上了,但收集日志发现,问题不一样,里面没有看到硬盘故障的信息,
DETAILED DISK INFORMATION [ArrayInfo Script]
**********************************************************************************************
Raid Drive Cur Max
Disk Group Capacity Intfc Vendor Drive Type Serial Number Part Number TLA Number Firmware State Replacing Speed Speed
0.0.0 14 300GB FC FUJITSU Allegro 9LE DM56807343 118032513 005048597 1905 Enabled --
0.0.1 14 300GB FC FUJITSU Allegro 9LE DM5680733G 118032513 005048597 1905 Enabled --
0.0.2 14 300GB FC FUJITSU Allegro 9LE DM568072VM 118032513 005048597 1905 Enabled --
0.0.3 14 300GB FC SEAGATE TimberlandRC 3RH0CC6P 118032567 005048751 C001 Enabled --
0.0.4 14 300GB FC FUJITSU Allegro 9LE DM56604THK 118032513 005048597 1905 Enabled --
0.0.5 14 300GB FC FUJITSU Allegro 9LE DM56806ENP 118032513 005048597 1905 Enabled --
0.0.6 14 300GB FC FUJITSU Allegro 9LE DM56A084AA 118032513 005048597 1905 Enabled --
0.0.7 14 300GB FC SEAGATE TimberlandRC 3RH0LYF1 118032567 005048751 C003 Enabled --
0.0.8 15 300GB FC FUJITSU Allegro 9LE DM56705KSL 118032513 005048597 1905 Enabled --
0.0.9 15 300GB FC FUJITSU Allegro 9LE DM56A07YKH 118032513 005048597 1905 Enabled --
0.0.10 15 300GB FC FUJITSU Allegro 9LE DM56A07Y1C 118032513 005048597 1905 Enabled --
0.0.11 15 300GB FC FUJITSU Allegro 9LE DM56A07Y7J 118032513 005048597 1905 Enabled --
0.0.12 15 300GB FC SEAGATE EagleRP 6SQ007Y5 118032662 005048953 RC05 Enabled --
0.0.13 15 300GB FC FUJITSU Allegro 9LE DM56A07WL1 118032513 005048597 1905 Enabled --
0.0.14 15 300GB FC SEAGATE TimberlandRC 3RH0SX6E 118032567 005048751 C003 Enabled --
0.1.0 2 300GB FC SEAGATE TimberlandRC 3RH0SDLW 118032567 005048751 C003 Enabled --
0.1.1 2 300GB FC SEAGATE TimberlandRC 3RH0KFB4 118032567 005048751 C003 Enabled --
0.1.2 2 300GB FC SEAGATE TimberlandRC 3RH0SELT 118032567 005048751 C003 Enabled --
0.1.3 2 300GB FC SEAGATE TimberlandRC 3RH0SEDG 118032567 005048751 C003 Enabled --
0.1.4 2 300GB FC SEAGATE TimberlandRC 3RH0SEQT 118032567 005048751 C003 Enabled --
0.1.5 3 300GB FC SEAGATE TimberlandRC 3RH0RY6N 118032567 005048751 C003 Enabled --
0.1.6 3 300GB FC SEAGATE TimberlandRC 3RH0SEXR 118032567 005048751 C003 Enabled --
0.1.7 3 300GB FC SEAGATE TimberlandRC 3RH0SEXW 118032567 005048751 C003 Enabled --
0.1.8 3 300GB FC SEAGATE TimberlandRC 3RH0RR1Y 118032567 005048751 C003 Enabled --
0.1.9 3 300GB FC SEAGATE TimberlandRC 3RH0S406 118032567 005048751 C003 Enabled --
0.1.10 4 300GB FC SEAGATE TimberlandRC 3RH0S5DY 118032567 005048751 C003 Enabled --
0.1.11 4 300GB FC SEAGATE TimberlandRC 3RH0S58Z 118032567 005048751 C003 Enabled --
0.1.12 4 300GB FC SEAGATE TimberlandRC 3RH0S5NP 118032567 005048751 C003 Enabled --
0.1.13 4 300GB FC SEAGATE TimberlandRC 3RH0N9Z5 118032567 005048751 C003 Enabled --
0.1.14 238 300GB FC SEAGATE TimberlandRC 3RH0SEVJ 118032567 005048751 C003 HS Ready Inactive
不过日志分析中确实有看到下面的报错:
[ 23700 lines deleted ]
B 11/30/13 23:48:09 Bus0 Enc1 Dsk4 6a0 Disk soft media error [Recovered error (on-drive ECC)] 0 0 22
B 11/30/13 23:48:31 Bus0 Enc1 Dsk4 6a0 Disk soft media error [Recovered error (on-drive ECC)] 0 0 22
B 11/30/13 23:48:37 Bus0 Enc1 Dsk4 6a0 Disk soft media error [Recovered error (on-drive ECC)] 0 0 22
B 12/01/13 00:01:17 Bus0 Enc1 Dsk4 6a0 Disk soft media error [Recovered error (on-drive ECC)] 0 0 22
B 12/01/13 00:06:20 Bus0 Enc1 Dsk4 6a0 Disk soft media error [Recovered error (on-drive ECC)] 0 0 22
B 12/01/13 00:16:30 Bus0 Enc1 Dsk4 6a0 Disk soft media error [Recovered error (on-drive ECC)] 0 0 22
B 12/01/13 00:18:33 Bus0 Enc1 Dsk4 6a0 Disk soft media error [Recovered error (on-drive ECC)] 0 0 22
B 12/01/13 00:19:21 Bus0 Enc1 Dsk4 6a0 Disk soft media error [Recovered error (on-drive ECC)] 0 0 22
B 12/01/13 00:22:24 Bus0 Enc1 Dsk4 6a0 Disk soft media error [Recovered error (on-drive ECC)] 0 0 22
B 12/01/13 00:22:33 Bus0 Enc1 Dsk4 6a0 Disk soft media error [Recovered error (on-drive ECC)] 0 0 22
B 12/01/13 00:23:18 Bus0 Enc1 Dsk4 6a0 Disk soft media error [Recovered error (on-drive ECC)] 0 0 22
B 12/01/13 00:24:28 Bus0 Enc1 Dsk4 6a0 Disk soft media error [Recovered error (on-drive ECC)] 0 0 22
B 12/01/13 00:33:26 Bus0 Enc1 Dsk4 6a0 Disk soft media error [Recovered error (on-drive ECC)] 0 0 22
B 12/01/13 00:34:29 Bus0 Enc1 Dsk4 6a0 Disk soft media error [Recovered error (on-drive ECC)] 0 0 22
B 12/01/13 00:38:22 Bus0 Enc1 Dsk4 6a0 Disk soft media error [Recovered error (on-drive ECC)] 0 0 22
B 12/01/13 00:47:39 Bus0 Enc1 Dsk4 6a0 Disk soft media error [Recovered error (on-drive ECC)] 0 0 22
B 12/01/13 00:47:45 Bus0 Enc1 Dsk4 6a0 Disk soft media error [Recovered error (on-drive ECC)] 0 0 22
B 12/01/13 00:48:04 Bus0 Enc1 Dsk4 6a0 Disk soft media error [Recovered error (on-drive ECC)] 0 0 22
B 12/01/13 00:49:33 Bus0 Enc1 Dsk4 6a0 Disk soft media error [Recovered error (on-drive ECC)] 0 0 22
B 12/01/13 00:49:38 Bus0 Enc1 Dsk4 6a0 Disk soft media error [Recovered error (on-drive ECC)] 0 0 22
稍后上传日志,请各位大侠帮忙解释下,谢谢了。
Chao_Ma
146 消息
0
2013年12月3日 01:00
是不是SP之间信息不同步导致的呢,可以考虑Roger的建议。另外,这个磁盘上有没有bind lun呢,不知道没有bind lun的情况是不是会是这种状态?
Roger_Wu
4K 消息
0
2013年12月3日 01:00
偶尔会有Navisphere GUI里看到的和实际情况不符的情形,可以考虑重启一下Management Server看看:
如何重启Navisphere Management Server
qihua1
196 消息
0
2013年12月3日 01:00
是的,不过因为还未去现场,所以先来这里看看大家的意见了。
qihua1
196 消息
0
2013年12月9日 01:00
各位好:
现场发现机器确实是0-1-4硬盘黄灯,重新插拔一样;硬盘状态容量为0,状态为remove的;
对二个控制器分别做management server 重启后,收集日志发现
Replace failed disk or call your Service Provide
更换上新的PN为005048751 的硬盘后,硬盘还是黄灯,硬盘在别的CX300上测试是没有问题的,进入管理界面查看硬盘还是remove状态;
收集日志细细查看,更换硬盘前后:
A 12/05/13 03:18:14 SP A 6c3 BE Fibre Loop Operational 0 0 0
A 12/05/13 03:18:14 ntmirror 71240015 UnitState change for disk 1 (1 3) ENABLED to Q-DEG [READY->LOGOUT].
A 12/05/13 03:18:14 ntmirror 71240015 UnitState change for disk 3 (3 7) Q-DEG to Q-SHUTDN [READY->LOGOUT].
A 12/05/13 03:18:14 ntmirror 71240015 UnitState change for disk 2 (3 7) Q-DEG to Q-SHUTDN [READY->LOGOUT].
A 12/05/13 03:18:14 ntmirror 71240015 UnitState change for disk 0 (1 3) ENABLED to Q-DEG [READY->LOGOUT].
A 12/05/13 03:18:15 ntmirror 71240015 UnitState change for disk 3 (7 3) Q-SHUTDN to Q-DEG [LOGOUT->READY].
A 12/05/13 03:18:16 ntmirror 71240015 UnitState change for disk 2 (7 3) Q-SHUTDN to Q-DEG [LOGOUT->READY].
A 12/05/13 03:18:17 ntmirror 71240015 UnitState change for disk 1 (3 1) Q-DEG to ENABLED [LOGOUT->READY].
A 12/05/13 03:18:18 ntmirror 71240015 UnitState change for disk 0 (3 1) Q-DEG to ENABLED [LOGOUT->READY].
B 12/05/13 03:18:40 Bus0 Enc1 Dsk4 78b Drive physically removed from slot 0 0 0
B 12/05/13 03:18:40 SP B 6c3 BE Fibre Loop Operational 0 0 0
B 12/05/13 03:18:40 ntmirror 71240015 UnitState change for disk 1 (1 3) ENABLED to Q-DEG [READY->LOGOUT].
B 12/05/13 03:18:40 ntmirror 71240015 UnitState change for disk 3 (3 7) Q-DEG to Q-SHUTDN [READY->LOGOUT].
B 12/05/13 03:18:40 ntmirror 71240015 UnitState change for disk 2 (3 7) Q-DEG to Q-SHUTDN [READY->LOGOUT].
B 12/05/13 03:18:40 ntmirror 71240015 UnitState change for disk 0 (1 3) ENABLED to Q-DEG [READY->LOGOUT].
B 12/05/13 03:18:41 ntmirror 71240015 UnitState change for disk 3 (7 3) Q-SHUTDN to Q-DEG [LOGOUT->READY].
B 12/05/13 03:18:41 ntmirror 71240015 UnitState change for disk 0 (3 1) Q-DEG to ENABLED [LOGOUT->READY].
triiage 工具分析中有大师error 报错:
Disk Hard Soft PFA& Abort Remap Xfer Tmout Par Bad Inval Recon Recov
Drive Rg Type Media Media Hdwr ByDev Errs Errs Errs ity Blks Sects Sects ByDrv
0.0.0 14 r5 0 0 0 0 0 0 0 0 0 0 2 0
0.0.1 14 r5 0 0 0 0 0 0 0 0 0 0 1 0
0.0.3 14 r5 2 1999 2 0 0 0 29 0 1999 0 853 18967
0.0.5 14 r5 0 0 0 0 0 0 0 0 0 0 0 2
0.0.9 15 r5 0 2 0 0 0 0 0 0 2 0 1 16
0.0.12 15 r5 0 319 0 0 0 0 0 0 319 0 119 741
0.1.0 2 r5 0 0 0 0 0 0 0 0 0 0 0 7
0.1.2 2 r5 0 0 0 0 0 0 0 0 0 0 0 738
0.1.3 2 r5 0 0 0 0 0 0 0 0 0 0 0 3
0.1.4 2 r5 0 4 1 0 0 0 0 0 4 0 2 23357
0.1.8 3 r5 0 0 0 0 0 0 0 0 0 0 0 3
0.1.10 4 r5 0 0 0 0 0 0 2 0 0 0 0 0
0.1.11 4 r5 0 0 0 0 0 0 2 0 0 0 0 0
0.1.12 4 r5 0 0 0 0 0 0 2 0 0 0 0 1
0.1.13 4 r5 0 0 0 0 0 0 2 0 0 0 0 0
说这么多,就是想了解下,为什么更换新硬盘还是remove的状态呢,EMC 有没有和其它存储一样的问题,一个group RAID 里面有穿孔的现象,也就是BadBlock 扇区,这种情况下更换硬盘是没有用的?
盼望大家给出一些建议,日志晚上再上传,现在网速太慢了,多谢。
qihua1
196 消息
0
2013年12月10日 08:00
求各位给点意见,已经更换了三块硬盘了,硬盘接上去都是黄灯闪烁,看不到硬盘容量,以及硬盘的SN等信息,还是当初的remove 状态,这是最新的日志
2个附件
CK200071800355_SPB_2013-12-10_05-41-35_1c7090_data.zip
CK200071800355_SPA_2013-12-10_05-12-21_1c701f_data.zip
qihua1
196 消息
0
2013年12月10日 19:00
日志中有0-1-3 被remove,这是同事的误操作,现在0-1-3硬盘已经正常,但是最开始故障的0-1-4还是remove的,有谁能帮忙分析下吗?
Roger_Wu
4K 消息
1
2013年12月10日 20:00
看了日志,有个比较严重的错误,按照KB emc264420的说法,只能提交给Engineering team处理。
Partition Needs Rebuild Status FAILED 1 Escalate to Sustaining
不过CX300早早就EOL了,EMC 800估计不会接你的case。实在不行的话,由于Disk 0.1.3可能被你误拔了,先等着它rebuild完成,再收集一次日志看看这个错误提示还在不。
- - 2 3 2 ST2 RAID-5 N - 1.0 TB RW- SP-B DEG 0.1.0 0.1.1 0.1.2 0.1.3 (REB:12%) 0.1.4 (S:0.1.14)
如果确认三块盘没有问题的话,那盘柜、LCC都有可能故障。保内的话,我会让CE带着盘柜和LCC上门,但CX300...
qihua1
196 消息
0
2013年12月10日 22:00
谢谢 ROGER 的建议
想再问下,目前应该3号硬盘rebuild完成了
A 12/10/13 08:47:24 Bus0 Enc1 Dsk3 67d All rebuilds for a FRU have completed 0 ffffffff ffffffff
B 12/10/13 08:47:31 Bus0 Enc1 Dsk3 604 CRU Unit Rebuild Complete 0 ffff0003 2ffff
B 12/10/13 08:47:31 Bus0 Enc1 Dsk3 67d All rebuilds for a FRU have completed 0 ffffffff ffffffff
Partition Needs Rebuild Status PASSED
机器过保了,现在找EMC做付费服务只怕也不干吧?
您说的您会考虑盘柜和LCC 上门,我当时是只认为盘柜的背板可能有问题,整个盘柜更换数据会受影响吗?
您是说要考虑换掉整个盘柜吗?LCC 方面主要是考虑什么问题呢?
Roger_Wu
4K 消息
0
2013年12月10日 23:00
硬盘slot是在盘柜上的,而怀疑LCC可能有问题是考虑到日志中798 event有些多,参考KB emc136308 "Troubleshooting a bad slot in a CLARiiON enclosure",
It may be a drive issue but this has occurred even after drive replacement, which indicates a possible bad slot. Since the port bypass circuit is also controlled by the LCC, verify if there is any issues with the LCC.
所以就条件的话,就一起带上门,当然盘柜slot故障的可能性更大。
Line 9614: A 12/05/13 03:00:20 Bus0 Enc1 Dsk4 798 The Drive Port Bypass Circuit Status changed. 1 0 0
Line 9626: A 12/05/13 03:00:59 Bus0 Enc1 Dsk4 798 The Drive Port Bypass Circuit Status changed. 0 0 0
Line 9669: A 12/05/13 03:02:18 Bus0 Enc1 Dsk4 798 The Drive Port Bypass Circuit Status changed. 0 0 0
Line 9679: B 12/05/13 03:02:34 Bus0 Enc1 Dsk4 798 The Drive Port Bypass Circuit Status changed. 1 0 0
Line 9718: A 12/05/13 03:12:25 Bus0 Enc1 Dsk4 798 The Drive Port Bypass Circuit Status changed. 0 0 0
Line 9728: B 12/05/13 03:12:47 Bus0 Enc1 Dsk4 798 The Drive Port Bypass Circuit Status changed. 1 0 0
Line 9740: B 12/05/13 03:13:06 Bus0 Enc1 Dsk4 798 The Drive Port Bypass Circuit Status changed. 0 0 0
Line 9750: A 12/05/13 03:15:54 Bus0 Enc1 Dsk4 798 The Drive Port Bypass Circuit Status changed. 1 0 0
Line 9771: B 12/05/13 03:16:34 Bus0 Enc1 Dsk4 798 The Drive Port Bypass Circuit Status changed. 1 0 0
Line 9783: B 12/05/13 03:16:56 Bus0 Enc1 Dsk4 798 The Drive Port Bypass Circuit Status changed. 0 0 0
Line 9793: A 12/05/13 03:17:59 Bus0 Enc1 Dsk4 798 The Drive Port Bypass Circuit Status changed. 1 0 0
Line 9825: B 12/05/13 03:18:54 Bus0 Enc1 Dsk4 798 The Drive Port Bypass Circuit Status changed. 0 0 0
Line 9984: A 12/06/13 05:49:15 Bus0 Enc1 Dsk4 798 The Drive Port Bypass Circuit Status changed. 1 0 0
Line 10006: A 12/06/13 05:49:31 Bus0 Enc1 Dsk4 798 The Drive Port Bypass Circuit Status changed. 0 0 0
Line 10026: A 12/06/13 05:49:42 Bus0 Enc1 Dsk4 798 The Drive Port Bypass Circuit Status changed. 1 0 0
Line 10083: B 12/06/13 05:59:14 Bus0 Enc1 Dsk4 798 The Drive Port Bypass Circuit Status changed. 1 0 0
Line 10106: B 12/06/13 05:59:36 Bus0 Enc1 Dsk4 798 The Drive Port Bypass Circuit Status changed. 0 0 0
Line 10143: A 12/06/13 06:12:26 Bus0 Enc1 Dsk4 798 The Drive Port Bypass Circuit Status changed. 0 0 0
Line 10186: A 12/06/13 06:15:23 Bus0 Enc1 Dsk4 798 The Drive Port Bypass Circuit Status changed. 0 0 0
Line 10195: B 12/06/13 06:15:25 Bus0 Enc1 Dsk4 798 The Drive Port Bypass Circuit Status changed. 0 0 0
Line 10343: A 12/10/13 04:52:42 Bus0 Enc1 Dsk4 798 The Drive Port Bypass Circuit Status changed. 0 0 0
Line 10351: A 12/10/13 04:52:46 Bus0 Enc1 Dsk4 798 The Drive Port Bypass Circuit Status changed. 1 0 0
Line 10365: B 12/10/13 04:52:51 Bus0 Enc1 Dsk4 798 The Drive Port Bypass Circuit Status changed. 0 0 0
Line 10372: A 12/10/13 04:52:52 Bus0 Enc1 Dsk4 798 The Drive Port Bypass Circuit Status changed. 0 0 0
Line 10373: B 12/10/13 04:52:52 Bus0 Enc1 Dsk4 798 The Drive Port Bypass Circuit Status changed. 1 0 0
Line 10408: B 12/10/13 04:53:01 Bus0 Enc1 Dsk4 798 The Drive Port Bypass Circuit Status changed. 0 0 0
Line 10429: A 12/10/13 04:53:12 Bus0 Enc1 Dsk4 798 The Drive Port Bypass Circuit Status changed. 0 0 0
Line 10440: B 12/10/13 04:53:18 Bus0 Enc1 Dsk4 798 The Drive Port Bypass Circuit Status changed. 0 0 0
Line 10471: A 12/10/13 04:53:41 Bus0 Enc1 Dsk4 798 The Drive Port Bypass Circuit Status changed. 0 0 0
Line 10532: A 12/10/13 04:54:51 Bus0 Enc1 Dsk4 798 The Drive Port Bypass Circuit Status changed. 0 0 0
Line 10542: A 12/10/13 04:54:55 Bus0 Enc1 Dsk4 798 The Drive Port Bypass Circuit Status changed. 1 0 0
Line 10553: B 12/10/13 04:54:59 Bus0 Enc1 Dsk4 798 The Drive Port Bypass Circuit Status changed. 0 0 0
Line 10564: B 12/10/13 04:55:02 Bus0 Enc1 Dsk4 798 The Drive Port Bypass Circuit Status changed. 1 0 0
Line 10576: A 12/10/13 04:55:07 Bus0 Enc1 Dsk4 798 The Drive Port Bypass Circuit Status changed. 0 0 0
Line 10587: B 12/10/13 04:55:12 Bus0 Enc1 Dsk4 798 The Drive Port Bypass Circuit Status changed. 0 0 0
Line 10597: A 12/10/13 04:56:16 Bus0 Enc1 Dsk4 798 The Drive Port Bypass Circuit Status changed. 1 0 0
Line 10619: A 12/10/13 04:56:59 Bus0 Enc1 Dsk4 798 The Drive Port Bypass Circuit Status changed. 0 0 0
Line 10639: A 12/10/13 05:00:08 Bus0 Enc1 Dsk4 798 The Drive Port Bypass Circuit Status changed. 1 0 0
Line 10650: B 12/10/13 05:00:15 Bus0 Enc1 Dsk4 798 The Drive Port Bypass Circuit Status changed. 1 0 0
Line 10691: B 12/10/13 05:00:54 Bus0 Enc1 Dsk4 798 The Drive Port Bypass Circuit Status changed. 1 0 0
Line 10703: A 12/10/13 05:00:59 Bus0 Enc1 Dsk4 798 The Drive Port Bypass Circuit Status changed. 0 0 0
Line 10714: B 12/10/13 05:01:06 Bus0 Enc1 Dsk4 798 The Drive Port Bypass Circuit Status changed. 0 0 0
Line 10724: A 12/10/13 05:01:18 Bus0 Enc1 Dsk4 798 The Drive Port Bypass Circuit Status changed. 1 0 0
Line 10777: B 12/10/13 05:17:45 Bus0 Enc1 Dsk3 798 The Drive Port Bypass Circuit Status changed. 1 0 0
Line 10809: B 12/10/13 05:17:54 Bus0 Enc1 Dsk3 798 The Drive Port Bypass Circuit Status changed. 0 0 0
Line 10898: B 12/10/13 05:44:49 Bus0 Enc1 Dsk4 798 The Drive Port Bypass Circuit Status changed. 1 0 0
Line 10910: B 12/10/13 05:45:05 Bus0 Enc1 Dsk4 798 The Drive Port Bypass Circuit Status changed. 0 0 0
Line 10920: B 12/10/13 05:45:12 Bus0 Enc1 Dsk4 798 The Drive Port Bypass Circuit Status changed. 1 0 0
Line 10932: B 12/10/13 05:45:34 Bus0 Enc1 Dsk4 798 The Drive Port Bypass Circuit Status changed. 0 0 0
Line 10942: B 12/10/13 05:45:53 Bus0 Enc1 Dsk4 798 The Drive Port Bypass Circuit Status changed. 1 0 0