开始新对话

未解决

此帖子已超过 5 年

2639

2013年12月3日 01:00

江湖救急:CX300硬盘故障问题,连换三块都无法识别到硬盘

请教各位一个很简单的问题哦,一台很老的CX300,远程管理 发现一个硬盘故障:

1.jpg

从这里看,应该是0-1-4硬盘故障了,原hot spare 硬盘是0-1-14,现在应该顶替上了,但收集日志发现,问题不一样,里面没有看到硬盘故障的信息,

DETAILED DISK INFORMATION                                                   [ArrayInfo Script]

**********************************************************************************************

         Raid          Drive                                                                                                            Cur   Max

Disk    Group Capacity Intfc   Vendor     Drive Type     Serial Number    Part Number  TLA Number    Firmware   State       Replacing Speed Speed

0.0.0      14    300GB FC      FUJITSU    Allegro 9LE    DM56807343       118032513    005048597     1905       Enabled     --                 

0.0.1      14    300GB FC      FUJITSU    Allegro 9LE    DM5680733G       118032513    005048597     1905       Enabled     --                 

0.0.2      14    300GB FC      FUJITSU    Allegro 9LE    DM568072VM       118032513    005048597     1905       Enabled     --                 

0.0.3      14    300GB FC      SEAGATE    TimberlandRC   3RH0CC6P         118032567    005048751     C001       Enabled     --                 

0.0.4      14    300GB FC      FUJITSU    Allegro 9LE    DM56604THK       118032513    005048597     1905       Enabled     --                 

0.0.5      14    300GB FC      FUJITSU    Allegro 9LE    DM56806ENP       118032513    005048597     1905       Enabled     --                 

0.0.6      14    300GB FC      FUJITSU    Allegro 9LE    DM56A084AA       118032513    005048597     1905       Enabled     --                 

0.0.7      14    300GB FC      SEAGATE    TimberlandRC   3RH0LYF1         118032567    005048751     C003       Enabled     --                 

0.0.8      15    300GB FC      FUJITSU    Allegro 9LE    DM56705KSL       118032513    005048597     1905       Enabled     --                 

0.0.9      15    300GB FC      FUJITSU    Allegro 9LE    DM56A07YKH       118032513    005048597     1905       Enabled     --                 

0.0.10     15    300GB FC      FUJITSU    Allegro 9LE    DM56A07Y1C       118032513    005048597     1905       Enabled     --                 

0.0.11     15    300GB FC      FUJITSU    Allegro 9LE    DM56A07Y7J       118032513    005048597     1905       Enabled     --                 

0.0.12     15    300GB FC      SEAGATE    EagleRP        6SQ007Y5         118032662    005048953     RC05       Enabled     --                 

0.0.13     15    300GB FC      FUJITSU    Allegro 9LE    DM56A07WL1       118032513    005048597     1905       Enabled     --                 

0.0.14     15    300GB FC      SEAGATE    TimberlandRC   3RH0SX6E         118032567    005048751     C003       Enabled     --                 

0.1.0       2    300GB FC      SEAGATE    TimberlandRC   3RH0SDLW         118032567    005048751     C003       Enabled     --                 

0.1.1       2    300GB FC      SEAGATE    TimberlandRC   3RH0KFB4         118032567    005048751     C003       Enabled     --                 

0.1.2       2    300GB FC      SEAGATE    TimberlandRC   3RH0SELT         118032567    005048751     C003       Enabled     --                 

0.1.3       2    300GB FC      SEAGATE    TimberlandRC   3RH0SEDG         118032567    005048751     C003       Enabled     --                 

0.1.4       2    300GB FC      SEAGATE    TimberlandRC   3RH0SEQT         118032567    005048751     C003       Enabled     --                 

0.1.5       3    300GB FC      SEAGATE    TimberlandRC   3RH0RY6N         118032567    005048751     C003       Enabled     --                 

0.1.6       3    300GB FC      SEAGATE    TimberlandRC   3RH0SEXR         118032567    005048751     C003       Enabled     --                 

0.1.7       3    300GB FC      SEAGATE    TimberlandRC   3RH0SEXW         118032567    005048751     C003       Enabled     --                 

0.1.8       3    300GB FC      SEAGATE    TimberlandRC   3RH0RR1Y         118032567    005048751     C003       Enabled     --                 

0.1.9       3    300GB FC      SEAGATE    TimberlandRC   3RH0S406         118032567    005048751     C003       Enabled     --                 

0.1.10      4    300GB FC      SEAGATE    TimberlandRC   3RH0S5DY         118032567    005048751     C003       Enabled     --                 

0.1.11      4    300GB FC      SEAGATE    TimberlandRC   3RH0S58Z         118032567    005048751     C003       Enabled     --                 

0.1.12      4    300GB FC      SEAGATE    TimberlandRC   3RH0S5NP         118032567    005048751     C003       Enabled     --                 

0.1.13      4    300GB FC      SEAGATE    TimberlandRC   3RH0N9Z5         118032567    005048751     C003       Enabled     --                 

0.1.14    238    300GB FC      SEAGATE    TimberlandRC   3RH0SEVJ         118032567    005048751     C003       HS Ready    Inactive   

不过日志分析中确实有看到下面的报错:

[ 23700 lines deleted ]

B       11/30/13 23:48:09 Bus0 Enc1 Dsk4        6a0 Disk soft media error [Recovered error (on-drive ECC)] 0    0        22

B       11/30/13 23:48:31 Bus0 Enc1 Dsk4        6a0 Disk soft media error [Recovered error (on-drive ECC)] 0    0        22

B       11/30/13 23:48:37 Bus0 Enc1 Dsk4        6a0 Disk soft media error [Recovered error (on-drive ECC)] 0    0        22

B       12/01/13 00:01:17 Bus0 Enc1 Dsk4        6a0 Disk soft media error [Recovered error (on-drive ECC)] 0    0        22

B       12/01/13 00:06:20 Bus0 Enc1 Dsk4        6a0 Disk soft media error [Recovered error (on-drive ECC)] 0    0        22

B       12/01/13 00:16:30 Bus0 Enc1 Dsk4        6a0 Disk soft media error [Recovered error (on-drive ECC)] 0    0        22

B       12/01/13 00:18:33 Bus0 Enc1 Dsk4        6a0 Disk soft media error [Recovered error (on-drive ECC)] 0    0        22

B       12/01/13 00:19:21 Bus0 Enc1 Dsk4        6a0 Disk soft media error [Recovered error (on-drive ECC)] 0    0        22

B       12/01/13 00:22:24 Bus0 Enc1 Dsk4        6a0 Disk soft media error [Recovered error (on-drive ECC)] 0    0        22

B       12/01/13 00:22:33 Bus0 Enc1 Dsk4        6a0 Disk soft media error [Recovered error (on-drive ECC)] 0    0        22

B       12/01/13 00:23:18 Bus0 Enc1 Dsk4        6a0 Disk soft media error [Recovered error (on-drive ECC)] 0    0        22

B       12/01/13 00:24:28 Bus0 Enc1 Dsk4        6a0 Disk soft media error [Recovered error (on-drive ECC)] 0    0        22

B       12/01/13 00:33:26 Bus0 Enc1 Dsk4        6a0 Disk soft media error [Recovered error (on-drive ECC)] 0    0        22

B       12/01/13 00:34:29 Bus0 Enc1 Dsk4        6a0 Disk soft media error [Recovered error (on-drive ECC)] 0    0        22

B       12/01/13 00:38:22 Bus0 Enc1 Dsk4        6a0 Disk soft media error [Recovered error (on-drive ECC)] 0    0        22

B       12/01/13 00:47:39 Bus0 Enc1 Dsk4        6a0 Disk soft media error [Recovered error (on-drive ECC)] 0    0        22

B       12/01/13 00:47:45 Bus0 Enc1 Dsk4        6a0 Disk soft media error [Recovered error (on-drive ECC)] 0    0        22

B       12/01/13 00:48:04 Bus0 Enc1 Dsk4        6a0 Disk soft media error [Recovered error (on-drive ECC)] 0    0        22

B       12/01/13 00:49:33 Bus0 Enc1 Dsk4        6a0 Disk soft media error [Recovered error (on-drive ECC)] 0    0        22

B       12/01/13 00:49:38 Bus0 Enc1 Dsk4        6a0 Disk soft media error [Recovered error (on-drive ECC)] 0    0        22

稍后上传日志,请各位大侠帮忙解释下,谢谢了。

146 消息

2013年12月3日 01:00

是不是SP之间信息不同步导致的呢,可以考虑Roger的建议。另外,这个磁盘上有没有bind lun呢,不知道没有bind lun的情况是不是会是这种状态?

4K 消息

2013年12月3日 01:00

偶尔会有Navisphere GUI里看到的和实际情况不符的情形,可以考虑重启一下Management Server看看:

如何重启Navisphere Management Server


196 消息

2013年12月3日 01:00

是的,不过因为还未去现场,所以先来这里看看大家的意见了。

196 消息

2013年12月9日 01:00

各位好:

现场发现机器确实是0-1-4硬盘黄灯,重新插拔一样;硬盘状态容量为0,状态为remove的;

对二个控制器分别做management server 重启后,收集日志发现

Critical   Disk 0_1_14   Hardware Faults/FCO   Hot Spare is in use.   Replace failed disk or call your Service Provider.  
Critical   Disk 0_1_4   Hardware Faults/FCO   Disk is not enabled.  

Replace failed disk or call your Service Provide

更换上新的PN为005048751     的硬盘后,硬盘还是黄灯,硬盘在别的CX300上测试是没有问题的,进入管理界面查看硬盘还是remove状态;

收集日志细细查看,更换硬盘前后:

A       12/05/13 03:18:14 SP A                6c3 BE Fibre Loop Operational                0    0        0

A       12/05/13 03:18:14 ntmirror       71240015 UnitState change for disk 1 (1 3) ENABLED to Q-DEG [READY->LOGOUT].

A       12/05/13 03:18:14 ntmirror       71240015 UnitState change for disk 3 (3 7) Q-DEG to Q-SHUTDN [READY->LOGOUT].

A       12/05/13 03:18:14 ntmirror       71240015 UnitState change for disk 2 (3 7) Q-DEG to Q-SHUTDN [READY->LOGOUT].

A       12/05/13 03:18:14 ntmirror       71240015 UnitState change for disk 0 (1 3) ENABLED to Q-DEG [READY->LOGOUT].

A       12/05/13 03:18:15 ntmirror       71240015 UnitState change for disk 3 (7 3) Q-SHUTDN to Q-DEG [LOGOUT->READY].

A       12/05/13 03:18:16 ntmirror       71240015 UnitState change for disk 2 (7 3) Q-SHUTDN to Q-DEG [LOGOUT->READY].

A       12/05/13 03:18:17 ntmirror       71240015 UnitState change for disk 1 (3 1) Q-DEG to ENABLED [LOGOUT->READY].

A       12/05/13 03:18:18 ntmirror       71240015 UnitState change for disk 0 (3 1) Q-DEG to ENABLED [LOGOUT->READY].

B       12/05/13 03:18:40 Bus0 Enc1 Dsk4      78b Drive physically removed from slot       0    0        0

B       12/05/13 03:18:40 SP B                6c3 BE Fibre Loop Operational                0    0        0

B       12/05/13 03:18:40 ntmirror       71240015 UnitState change for disk 1 (1 3) ENABLED to Q-DEG [READY->LOGOUT].

B       12/05/13 03:18:40 ntmirror       71240015 UnitState change for disk 3 (3 7) Q-DEG to Q-SHUTDN [READY->LOGOUT].

B       12/05/13 03:18:40 ntmirror       71240015 UnitState change for disk 2 (3 7) Q-DEG to Q-SHUTDN [READY->LOGOUT].

B       12/05/13 03:18:40 ntmirror       71240015 UnitState change for disk 0 (1 3) ENABLED to Q-DEG [READY->LOGOUT].

B       12/05/13 03:18:41 ntmirror       71240015 UnitState change for disk 3 (7 3) Q-SHUTDN to Q-DEG [LOGOUT->READY].

B       12/05/13 03:18:41 ntmirror       71240015 UnitState change for disk 0 (3 1) Q-DEG to ENABLED [LOGOUT->READY].

triiage 工具分析中有大师error 报错:

Disk                Hard  Soft  PFA& Abort Remap  Xfer Tmout   Par   Bad Inval Recon Recov

Drive     Rg Type  Media Media  Hdwr ByDev  Errs  Errs  Errs   ity  Blks Sects Sects ByDrv

0.0.0     14 r5        0     0     0     0     0     0     0     0     0     0     2     0

0.0.1     14 r5        0     0     0     0     0     0     0     0     0     0     1     0

0.0.3     14 r5        2  1999     2     0     0     0    29     0  1999     0   853 18967

0.0.5     14 r5        0     0     0     0     0     0     0     0     0     0     0     2

0.0.9     15 r5        0     2     0     0     0     0     0     0     2     0     1    16

0.0.12    15 r5        0   319     0     0     0     0     0     0   319     0   119   741

0.1.0      2 r5        0     0     0     0     0     0     0     0     0     0     0     7

0.1.2      2 r5        0     0     0     0     0     0     0     0     0     0     0   738

0.1.3      2 r5        0     0     0     0     0     0     0     0     0     0     0     3

0.1.4      2 r5        0     4     1     0     0     0     0     0     4     0     2 23357

0.1.8      3 r5        0     0     0     0     0     0     0     0     0     0     0     3

0.1.10     4 r5        0     0     0     0     0     0     2     0     0     0     0     0

0.1.11     4 r5        0     0     0     0     0     0     2     0     0     0     0     0

0.1.12     4 r5        0     0     0     0     0     0     2     0     0     0     0     1

0.1.13     4 r5        0     0     0     0     0     0     2     0     0     0     0     0

说这么多,就是想了解下,为什么更换新硬盘还是remove的状态呢,EMC 有没有和其它存储一样的问题,一个group RAID 里面有穿孔的现象,也就是BadBlock 扇区,这种情况下更换硬盘是没有用的?

盼望大家给出一些建议,日志晚上再上传,现在网速太慢了,多谢。

196 消息

2013年12月10日 08:00

求各位给点意见,已经更换了三块硬盘了,硬盘接上去都是黄灯闪烁,看不到硬盘容量,以及硬盘的SN等信息,还是当初的remove 状态,这是最新的日志

2个附件

196 消息

2013年12月10日 19:00

日志中有0-1-3 被remove,这是同事的误操作,现在0-1-3硬盘已经正常,但是最开始故障的0-1-4还是remove的,有谁能帮忙分析下吗?

4K 消息

2013年12月10日 20:00

看了日志,有个比较严重的错误,按照KB emc264420的说法,只能提交给Engineering team处理。

Partition Needs Rebuild Status   FAILED   1    Escalate to Sustaining


不过CX300早早就EOL了,EMC 800估计不会接你的case。实在不行的话,由于Disk 0.1.3可能被你误拔了,先等着它rebuild完成,再收集一次日志看看这个错误提示还在不。

    -    -    2    3    2      ST2  RAID-5  N  -    1.0 TB  RW-  SP-B      DEG  0.1.0  0.1.1  0.1.2  0.1.3 (REB:12%) 0.1.4 (S:0.1.14)

如果确认三块盘没有问题的话,那盘柜、LCC都有可能故障。保内的话,我会让CE带着盘柜和LCC上门,但CX300...


196 消息

2013年12月10日 22:00

谢谢 ROGER 的建议

想再问下,目前应该3号硬盘rebuild完成了

A 12/10/13 08:47:24 Bus0 Enc1 Dsk3 67d All rebuilds for a FRU have completed 0 ffffffff ffffffff

B 12/10/13 08:47:31 Bus0 Enc1 Dsk3 604 CRU Unit Rebuild Complete 0 ffff0003 2ffff

B 12/10/13 08:47:31 Bus0 Enc1 Dsk3 67d All rebuilds for a FRU have completed 0 ffffffff ffffffff

Partition Needs Rebuild Status PASSED

机器过保了,现在找EMC做付费服务只怕也不干吧?

您说的您会考虑盘柜和LCC 上门,我当时是只认为盘柜的背板可能有问题,整个盘柜更换数据会受影响吗?

您是说要考虑换掉整个盘柜吗?LCC 方面主要是考虑什么问题呢?

4K 消息

2013年12月10日 23:00

硬盘slot是在盘柜上的,而怀疑LCC可能有问题是考虑到日志中798 event有些多,参考KB emc136308 "Troubleshooting a bad slot in a CLARiiON enclosure",

It may be a drive issue but this has occurred even after drive replacement, which indicates a possible bad slot. Since the port bypass circuit is also controlled by the LCC, verify if there is any issues with the LCC.

所以就条件的话,就一起带上门,当然盘柜slot故障的可能性更大。

Line 9614: A      12/05/13 03:00:20 Bus0 Enc1 Dsk4        798 The Drive Port Bypass Circuit Status changed. 1    0        0

Line 9626: A      12/05/13 03:00:59 Bus0 Enc1 Dsk4        798 The Drive Port Bypass Circuit Status changed. 0    0        0

Line 9669: A      12/05/13 03:02:18 Bus0 Enc1 Dsk4        798 The Drive Port Bypass Circuit Status changed. 0    0        0

Line 9679: B      12/05/13 03:02:34 Bus0 Enc1 Dsk4        798 The Drive Port Bypass Circuit Status changed. 1    0        0

Line 9718: A      12/05/13 03:12:25 Bus0 Enc1 Dsk4        798 The Drive Port Bypass Circuit Status changed. 0    0        0

Line 9728: B      12/05/13 03:12:47 Bus0 Enc1 Dsk4        798 The Drive Port Bypass Circuit Status changed. 1    0        0

Line 9740: B      12/05/13 03:13:06 Bus0 Enc1 Dsk4        798 The Drive Port Bypass Circuit Status changed. 0    0        0

Line 9750: A      12/05/13 03:15:54 Bus0 Enc1 Dsk4        798 The Drive Port Bypass Circuit Status changed. 1    0        0

Line 9771: B      12/05/13 03:16:34 Bus0 Enc1 Dsk4        798 The Drive Port Bypass Circuit Status changed. 1    0        0

Line 9783: B      12/05/13 03:16:56 Bus0 Enc1 Dsk4        798 The Drive Port Bypass Circuit Status changed. 0    0        0

Line 9793: A      12/05/13 03:17:59 Bus0 Enc1 Dsk4        798 The Drive Port Bypass Circuit Status changed. 1    0        0

Line 9825: B      12/05/13 03:18:54 Bus0 Enc1 Dsk4        798 The Drive Port Bypass Circuit Status changed. 0    0        0

Line 9984: A      12/06/13 05:49:15 Bus0 Enc1 Dsk4        798 The Drive Port Bypass Circuit Status changed. 1    0        0

Line 10006: A      12/06/13 05:49:31 Bus0 Enc1 Dsk4        798 The Drive Port Bypass Circuit Status changed. 0    0        0

Line 10026: A      12/06/13 05:49:42 Bus0 Enc1 Dsk4        798 The Drive Port Bypass Circuit Status changed. 1    0        0

Line 10083: B      12/06/13 05:59:14 Bus0 Enc1 Dsk4        798 The Drive Port Bypass Circuit Status changed. 1    0        0

Line 10106: B      12/06/13 05:59:36 Bus0 Enc1 Dsk4        798 The Drive Port Bypass Circuit Status changed. 0    0        0

Line 10143: A      12/06/13 06:12:26 Bus0 Enc1 Dsk4        798 The Drive Port Bypass Circuit Status changed. 0    0        0

Line 10186: A      12/06/13 06:15:23 Bus0 Enc1 Dsk4        798 The Drive Port Bypass Circuit Status changed. 0    0        0

Line 10195: B      12/06/13 06:15:25 Bus0 Enc1 Dsk4        798 The Drive Port Bypass Circuit Status changed. 0    0        0

Line 10343: A      12/10/13 04:52:42 Bus0 Enc1 Dsk4        798 The Drive Port Bypass Circuit Status changed. 0    0        0

Line 10351: A      12/10/13 04:52:46 Bus0 Enc1 Dsk4        798 The Drive Port Bypass Circuit Status changed. 1    0        0

Line 10365: B      12/10/13 04:52:51 Bus0 Enc1 Dsk4        798 The Drive Port Bypass Circuit Status changed. 0    0        0

Line 10372: A      12/10/13 04:52:52 Bus0 Enc1 Dsk4        798 The Drive Port Bypass Circuit Status changed. 0    0        0

Line 10373: B      12/10/13 04:52:52 Bus0 Enc1 Dsk4        798 The Drive Port Bypass Circuit Status changed. 1    0        0

Line 10408: B      12/10/13 04:53:01 Bus0 Enc1 Dsk4        798 The Drive Port Bypass Circuit Status changed. 0    0        0

Line 10429: A      12/10/13 04:53:12 Bus0 Enc1 Dsk4        798 The Drive Port Bypass Circuit Status changed. 0    0        0

Line 10440: B      12/10/13 04:53:18 Bus0 Enc1 Dsk4        798 The Drive Port Bypass Circuit Status changed. 0    0        0

Line 10471: A      12/10/13 04:53:41 Bus0 Enc1 Dsk4        798 The Drive Port Bypass Circuit Status changed. 0    0        0

Line 10532: A      12/10/13 04:54:51 Bus0 Enc1 Dsk4        798 The Drive Port Bypass Circuit Status changed. 0    0        0

Line 10542: A      12/10/13 04:54:55 Bus0 Enc1 Dsk4        798 The Drive Port Bypass Circuit Status changed. 1    0        0

Line 10553: B      12/10/13 04:54:59 Bus0 Enc1 Dsk4        798 The Drive Port Bypass Circuit Status changed. 0    0        0

Line 10564: B      12/10/13 04:55:02 Bus0 Enc1 Dsk4        798 The Drive Port Bypass Circuit Status changed. 1    0        0

Line 10576: A      12/10/13 04:55:07 Bus0 Enc1 Dsk4        798 The Drive Port Bypass Circuit Status changed. 0    0        0

Line 10587: B      12/10/13 04:55:12 Bus0 Enc1 Dsk4        798 The Drive Port Bypass Circuit Status changed. 0    0        0

Line 10597: A      12/10/13 04:56:16 Bus0 Enc1 Dsk4        798 The Drive Port Bypass Circuit Status changed. 1    0        0

Line 10619: A      12/10/13 04:56:59 Bus0 Enc1 Dsk4        798 The Drive Port Bypass Circuit Status changed. 0    0        0

Line 10639: A      12/10/13 05:00:08 Bus0 Enc1 Dsk4        798 The Drive Port Bypass Circuit Status changed. 1    0        0

Line 10650: B      12/10/13 05:00:15 Bus0 Enc1 Dsk4        798 The Drive Port Bypass Circuit Status changed. 1    0        0

Line 10691: B      12/10/13 05:00:54 Bus0 Enc1 Dsk4        798 The Drive Port Bypass Circuit Status changed. 1    0        0

Line 10703: A      12/10/13 05:00:59 Bus0 Enc1 Dsk4        798 The Drive Port Bypass Circuit Status changed. 0    0        0

Line 10714: B      12/10/13 05:01:06 Bus0 Enc1 Dsk4        798 The Drive Port Bypass Circuit Status changed. 0    0        0

Line 10724: A      12/10/13 05:01:18 Bus0 Enc1 Dsk4        798 The Drive Port Bypass Circuit Status changed. 1    0        0

Line 10777: B      12/10/13 05:17:45 Bus0 Enc1 Dsk3        798 The Drive Port Bypass Circuit Status changed. 1    0        0

Line 10809: B      12/10/13 05:17:54 Bus0 Enc1 Dsk3        798 The Drive Port Bypass Circuit Status changed. 0    0        0

Line 10898: B      12/10/13 05:44:49 Bus0 Enc1 Dsk4        798 The Drive Port Bypass Circuit Status changed. 1    0        0

Line 10910: B      12/10/13 05:45:05 Bus0 Enc1 Dsk4        798 The Drive Port Bypass Circuit Status changed. 0    0        0

Line 10920: B      12/10/13 05:45:12 Bus0 Enc1 Dsk4        798 The Drive Port Bypass Circuit Status changed. 1    0        0

Line 10932: B      12/10/13 05:45:34 Bus0 Enc1 Dsk4        798 The Drive Port Bypass Circuit Status changed. 0    0        0

Line 10942: B      12/10/13 05:45:53 Bus0 Enc1 Dsk4        798 The Drive Port Bypass Circuit Status changed. 1    0        0

找不到事件!

Top