DAE2 Bus 0 Enclosure 1 *FAULT* (Bus 0 Enclosure 1 : Failed - ) Bus 0 Enclosure 1 Fan A State: Present Bus 0 Enclosure 1 Fan B State: Present Bus 0 Enclosure 1 Power A State: Present Bus 0 Enclosure 1 Power B State: Present Bus 0 Enclosure 1 LCC A State: Present Bus 0 Enclosure 1 LCC B State: Present Bus 0 Enclosure 1 LCC A Revision: 6.69 Bus 0 Enclosure 1 LCC B Revision: 6.69 Bus 0 Enclosure 1 LCC A Serial #: FCNBD080116669 Bus 0 Enclosure 1 LCC B Serial #: FCNBD082815479
DAE2 Bus 0 Enclosure 2 *FAULT* (Bus 0 Enclosure 2 : Failed - ) Bus 0 Enclosure 2 Fan A State: Present Bus 0 Enclosure 2 Fan B State: Present Bus 0 Enclosure 2 Power A State: Present Bus 0 Enclosure 2 Power B State: Present Bus 0 Enclosure 2 LCC A State: Present Bus 0 Enclosure 2 LCC B State: Present Bus 0 Enclosure 2 LCC A Revision: 6.69 Bus 0 Enclosure 2 LCC B Revision: 6.69 Bus 0 Enclosure 2 LCC A Serial #: FCNBD064303767 Bus 0 Enclosure 2 LCC B Serial #: FCNBD064303887
DAE2 Bus 0 Enclosure 3 *FAULT* (Bus 0 Enclosure 3 : Failed - ) Bus 0 Enclosure 3 Fan A State: Present Bus 0 Enclosure 3 Fan B State: Present Bus 0 Enclosure 3 Power A State: Present Bus 0 Enclosure 3 Power B State: Present Bus 0 Enclosure 3 LCC A State: Present Bus 0 Enclosure 3 LCC B State: Present Bus 0 Enclosure 3 LCC A Revision: 6.69 Bus 0 Enclosure 3 LCC B Revision: 6.69 Bus 0 Enclosure 3 LCC A Serial #: FCNBD064414515 Bus 0 Enclosure 3 LCC B Serial #: FCNBD064414463
Roger_Wu
4K 消息
0
2014年1月16日 18:00
CX300已经EOSL了,打800基本上不会受理...
帮你看了下日志,应该是盘柜有问题,
Line 7813: B 01/14/14 16:09:52 Bus0 Enc1 a25 Enclosure state change [ok->failed] 14 0 0
Line 7814: B 01/14/14 16:09:52 Bus0 Enc1 Dsk0 a07 CRU Powered Down [CM: Killed by CM EnclosureFailed] 0 ffff0006 920c
Line 7817: B 01/14/14 16:09:52 Bus0 Enc1 Dsk1 a07 CRU Powered Down [CM: Killed by CM EnclosureFailed] 0 ffff0006 920c
Line 7818: B 01/14/14 16:09:52 Bus0 Enc1 Dsk2 a07 CRU Powered Down [CM: Killed by CM EnclosureFailed] 0 ffff0006 920c
Line 7819: B 01/14/14 16:09:52 Bus0 Enc1 Dsk3 a07 CRU Powered Down [CM: Killed by CM EnclosureFailed] 0 ffff0006 920c
Line 7820: B 01/14/14 16:09:52 Bus0 Enc1 Dsk4 a07 CRU Powered Down [CM: Killed by CM EnclosureFailed] 0 ffff0006 920c
Line 7821: B 01/14/14 16:09:52 Bus0 Enc1 Dsk5 a07 CRU Powered Down [CM: Killed by CM EnclosureFailed] 0 ffff0006 920c
Line 7822: B 01/14/14 16:09:52 Bus0 Enc1 Dsk6 a07 CRU Powered Down [CM: Killed by CM EnclosureFailed] 0 ffff0006 920c
Line 7823: B 01/14/14 16:09:52 Bus0 Enc1 Dsk7 a07 CRU Powered Down [CM: Killed by CM EnclosureFailed] 0 ffff0006 920c
Line 7824: B 01/14/14 16:09:52 Bus0 Enc1 Dsk8 a07 CRU Powered Down [CM: Killed by CM EnclosureFailed] 0 ffff0006 920c
Line 7825: B 01/14/14 16:09:52 Bus0 Enc1 DskA a07 CRU Powered Down [CM: Killed by CM EnclosureFailed] 0 ffff0006 920c
Line 7826: B 01/14/14 16:09:52 Bus0 Enc1 DskB a07 CRU Powered Down [CM: Killed by CM EnclosureFailed] 0 ffff0006 920c
Line 7827: B 01/14/14 16:09:52 Bus0 Enc1 DskC a07 CRU Powered Down [CM: Killed by CM EnclosureFailed] 0 ffff0006 920c
可以照着emc78506操作一下,不过Enc 0_1、0_2、0_3都有问题(都是SPB报错,也说不定reboot SP能好),只有0_0是好的。如果reboot SP无效,建议再检查一下连线或直接替换看看。如果换了线也没用的话,那是盘柜自身问题(),赶快备份数据迁移到正常的阵列上吧。CX300 05年发售的,这都要9年了吧。
SPA一侧盘柜状态:
DAE2 Bus 0 Enclosure 1 *FAULT*
(Bus 0 Enclosure 1 : Failed - )
Bus 0 Enclosure 1 Fan A State: Present
Bus 0 Enclosure 1 Fan B State: Present
Bus 0 Enclosure 1 Power A State: Present
Bus 0 Enclosure 1 Power B State: Present
Bus 0 Enclosure 1 LCC A State: Present
Bus 0 Enclosure 1 LCC B State: Present
Bus 0 Enclosure 1 LCC A Revision: 6.69
Bus 0 Enclosure 1 LCC B Revision: 6.69
Bus 0 Enclosure 1 LCC A Serial #: FCNBD080116669
Bus 0 Enclosure 1 LCC B Serial #: FCNBD082815479
DAE2 Bus 0 Enclosure 2 *FAULT*
(Bus 0 Enclosure 2 : Failed - )
Bus 0 Enclosure 2 Fan A State: Present
Bus 0 Enclosure 2 Fan B State: Present
Bus 0 Enclosure 2 Power A State: Present
Bus 0 Enclosure 2 Power B State: Present
Bus 0 Enclosure 2 LCC A State: Present
Bus 0 Enclosure 2 LCC B State: Present
Bus 0 Enclosure 2 LCC A Revision: 6.69
Bus 0 Enclosure 2 LCC B Revision: 6.69
Bus 0 Enclosure 2 LCC A Serial #: FCNBD064303767
Bus 0 Enclosure 2 LCC B Serial #: FCNBD064303887
DAE2 Bus 0 Enclosure 3 *FAULT*
(Bus 0 Enclosure 3 : Failed - )
Bus 0 Enclosure 3 Fan A State: Present
Bus 0 Enclosure 3 Fan B State: Present
Bus 0 Enclosure 3 Power A State: Present
Bus 0 Enclosure 3 Power B State: Present
Bus 0 Enclosure 3 LCC A State: Present
Bus 0 Enclosure 3 LCC B State: Present
Bus 0 Enclosure 3 LCC A Revision: 6.69
Bus 0 Enclosure 3 LCC B Revision: 6.69
Bus 0 Enclosure 3 LCC A Serial #: FCNBD064414515
Bus 0 Enclosure 3 LCC B Serial #: FCNBD064414463
SPB一侧盘柜状态:
DAE2P Bus 0 Enclosure 1 *FAULT*
(Bus 0 Enclosure 1 : Failed - )
DAE2P Bus 0 Enclosure 2 *FAULT*
(Bus 0 Enclosure 2 : Failed - )
DAE2P Bus 0 Enclosure 3 *FAULT*
(Bus 0 Enclosure 3 : Failed - )
qihua1
196 消息
0
2014年1月16日 05:00
今天我到达现场看了下,此4个硬盘都是黄灯,后面扩展柜电源,LCC等都正常,管理界面下SPA正常管理 ,SPB里无法查看到扩展柜的信息;
首先重新插拔了所有3个故障热备盘,把故障的0-1-9硬盘更换了新的,黄灯变成正常绿灯,但是进入管理界面发现硬盘一直是处于powering up的状态,热备盘也一直是T rebuilding的状态,故障的0-1-9硬盘并未回写数据。
qihua1
196 消息
0
2014年1月16日 05:00
下面贴上今天的日志,各位有时间请帮忙看一下,谢谢
我自己找了些资料,emc78506 里有点类似 ,但是我这SPB的显示问题不知道是怎么回事呢?重启控制器真有效果吗?因为我们无法轻易做这操作。
Navisphere Management Server 已经分别重启过了,无效。
在线等回复 ,谢谢。
2个附件
CK200073000492_SPA_2014-01-16_06-09-19_1db047_data.zip
CK200073000492_SPB_2014-01-16_06-23-11_1c322d_data.zip
Yanhong1
1.6K 消息
0
2014年1月16日 06:00
lz,现在挺晚了。如果情况很紧急的话,建议你直接打800找售后支持哦,不过现在的时间,你可能需要和欧洲工程师用英语沟通解决了
qihua1
196 消息
0
2014年1月21日 04:00
roger,y谢谢你的热心回复
这个重启控制器的操作准备去操作了,想问下,如果固件版本是2.6以下的,比如说2.1左右的,在图形界面下SP中没有reboot的选项,命令行 reboot命令也无法使用,有什么办法能重启控制器吗?还是只能插拔控制器来重启?
另外,为了防止故障,重启控制器SPB的时候需要把lun全部手动迁移到SPA上去吗?现在是4链路的冗余状态。
qihua1
196 消息
0
2014年1月21日 05:00
哦,是的,是2.19的版本,里面没有左键reboot..命令名也用不了reboot.
之前没有做过lun的迁移,想请教下迁移lun需要多长时间呢?会很快吗?
Roger_Wu
4K 消息
0
2014年1月21日 05:00
2.6、2.1是什么的版本?CX300的FLARE版本应该是02.26、02.24或02.19的。
你右键SP没有Reboot选项?难道是02.19的?不过这么老的版本确实没用过...
如果硬要插拔的话,把LUN迁走当然更保险了,还可以尽量停掉主机IO甚至关了写缓存。操作前注意关键数据的备份,毕竟已经有故障了,说不定关机再开会跑出来更多问题......
Roger_Wu
4K 消息
0
2014年1月21日 17:00
就是LUN Trespass,改变一下LUN的owner,还是挺快的。
qihua1
196 消息
0
2014年1月23日 21:00
谢谢各位的帮忙
本故障在重启SPB后,SPB与扩展柜间的通讯问题解决
在重启SPA,重新插入powering up状态的硬盘后,硬盘正常识别处于enabled的状态了。
问题全部解决。
lbseraph
53 消息
1
2014年1月27日 06:00
看的晚了,从上面的描述来看,其实这应该是B边Backend loop的问题可能性比较大。正常的顺序应该是先重新插拔一下DAE 0_1 LCC B和对应的接线,如果能把B边loop带起来的话就重新插拔一下热备盘就可以了(如果backend loop有任何一侧故障,那么该loop上面的热备盘都会报fault的);如果还不行的话,就尝试重启一下SP B,上面的步骤应该也能省掉SP A重启那一步。个人意见,仅供参考。