开始新对话

未解决

此帖子已超过 5 年

1078

2013年5月1日 20:00

【分享】DD140,160,610,620,630连续重新启动的故障排除

​DD140,160,610,620,630连续重新启动的故障排除​
​ ​
​目的​
​ ​
​本文介绍如何解决可能是由于磁盘故障导致DD无法开机或开机循环​


​适用于​
​ ​
​DD140,160,610,620,630​
​ ​
​5.1.3.0之前的所有软件版本​
​ ​
​原因​
​ ​
​发生磁盘的故障​

​解决方案​

​ 如果DD OS系统无法启动,在单用户模式下重新启动和审查dmesg日志的磁盘错误。​

​ 查看panic警报:消息​

​ 验证panic的原因是DG0的丢失:​

​PANIC: dd_dgrp.c:1052 dd_dgrp_stop:: Can't tolerate essential dgrp dg0 going down​

​审查kern.info的命令超时的迹象:​

​(E4)[1321876311.967354] scsi1: Command timeout (ffff810084ad47c0) (0/5/10000)​

​chnl/tgt/lun 0/0/0 cdb 0x2a:395159a7:0005 Retry​

​(E4)[1321876311.967427] scsi cmd timed out on target 0: cdb[2a, 0,39,51, 59,a7,​

​0, 0, 5, 0, 0, 0].​

​(E4)[1321876321.960456] scsi1: Command timeout (ffff810084ad47c0) (0/5/10000)​

​chnl/tgt/lun 0/0/0 cdb 0x2a:395159a7:0005 Retry​

​(E4)[1321876321.960529] scsi cmd timed out on target 0: cdb[2a, 0,39,51, 59,a7,​

​0, 0, 5, 0, 0, 0].​

​(E4)[1321876331.953569] scsi1: Command timeout (ffff810084ad47c0) (0/5/10000)​

​chnl/tgt/lun 0/0/0 cdb 0x2a:395159a7:0005 Retry​

​(E4)[1321876331.953642] scsi cmd timed out on target 0: cdb[2a, 0,39,51, 59,a7,​

​0, 0, 5, 0, 0, 0].​

​(E4)[1321876341.946682] scsi1: Command timeout (ffff810084ad47c0) (0/5/10000)​

​chnl/tgt/lun 0/0/0 cdb 0x2a:395159a7:0005 Retry​

​(E4)[1321876341.946755] scsi cmd timed out on target 0: cdb[2a, 0,39,51, 59,a7,​

​0, 0, 5, 0, 0, 0].​

​(E4)[1321876351.939721] scsi1: Command timeout (ffff810084ad47c0) (0/5/10000)​

​chnl/tgt/lun 0/0/0 cdb 0x2a:395159a7:0005 Retry​

​(E4)[1321876351.939794] scsi cmd timed out on target 0: cdb[2a, 0,39,51, 59,a7,​

​0, 0, 5, 0, 0, 0].​

​(E4)[1321876361.932876] scsi1: Command timeout (ffff810084ad47c0) (0/5/10000)​

​chnl/tgt/lun 0/0/0 cdb 0x2a:395159a7:0005 Retry​

​(E4)[1321876361.932949] scsi cmd timed out on target 0: cdb[2a, 0,39,51, 59,a7,​

​0, 0, 5, 0, 0, 0].​

​(E4)[1321876371.925895] ​​scsi1​​: Command timeout (ffff810084ad47c0) (0/5/10000)​

​chnl​​/​​tgt​​/​​lun​​ ​​0​​/​​0​​/​​0​​ cdb 0x2a:395159a7:0005 Retry​

​(E4)[1321876371.925968] scsi cmd timed out on target 0: cdb[2a, 0,39,51, 59,a7,​

​0, 0, 5, 0, 0, 0].​

​(U0)(MSG-KERN-00018):[1321876378.494639] Kernel panic - not syncing: Watchdog​

​pre-timeout​

​确定问题的磁盘。在上面的例子 磁盘是1:0:0:0,​​scsi1​​ ​​Channel 0​​ / ​​Target 0​​ / ​​LUN 0​

​转换为1.10 [SDM]磁盘。​

​搜寻kern.info, 1:0:0:0的最后一次出现:​

​Aug 20 12:37:18 fgnnbudd001 kernel: (E5)[ 16994582.927071] scsi 1:0:0:0 Vendor: WDC Model: ​

​WD1002FBYS-02A6B Rev: 03.0​

​Aug 20 12:37:18 fgnnbudd001 kernel: (E4)[ 16994582.927661] scsi 1:0:0:0 SerialNo: MVSATA​

​Aug 20 12:37:18 fgnnbudd001 kernel: (E5)[ 16994582.928067] scsi 1:0:0:0 Type: Direct-Access ​

​ANSI SCSI revision: 05​

​Aug 20 12:37:18 fgnnbudd001 kernel: (E5)[ 16994582.928124] scsi 1:0:0:0: Direct-Access WDC​

​WD1002FBYS-02A6B 03.0 PQ: 0 ANSI: 5​

​Aug 20 12:37:18 fgnnbudd001 kernel: (E5)[ 16994582.928613] sd 1:0:0:0: [sdm] 1953525168 512-byte ​

​hardware sectors (1000205 MB)​

​Aug 20 12:37:18 fgnnbudd001 kernel: (E5)[ 16994582.928647] sd 1:0:0:0: [sdm] Write Protect is off​

​Aug 20 12:37:18 fgnnbudd001 kernel: (E7)[ 16994582.928666] sd 1:0:0:0: [sdm] Mode Sense: 17 00 10 00​

​Aug 20 12:37:18 fgnnbudd001 kernel: (E5)[ 16994582.928696] sd 1:0:0:0: [sdm] Write cache: enabled, ​

​read cache: enabled, supports DPO and FUA​

​Aug 20 12:37:18 fgnnbudd001 kernel: (E5)[ 16994582.928789] sd 1:0:0:0: [sdm] 1953525168 512-byte ​

​hardware sectors (1000205 MB)​

​Aug 20 12:37:18 fgnnbudd001 kernel: (E5)[ 16994582.928815] sd 1:0:0:0: [sdm] Write Protect is off​

​Aug 20 12:37:18 fgnnbudd001 kernel: (E7)[ 16994582.928834] sd 1:0:0:0: [sdm] Mode Sense: 17 00 10 00​

​Aug 20 12:37:18 fgnnbudd001 kernel: (E5)[ 16994582.928864] sd 1:0:0:0: [sdm] Write cache: enabled, ​

​read cache: enabled, supports DPO and FUA​

​来自autosupport:​

​1.10 WDC_WD1002FBYS-02A6B0_WD-WMATV9296306 dm-1 254:16 sdm 8:192 sg10 1:0:0:0 hu:​

​Active ONLINE​

​一旦确定了发生“命令超时”的磁盘,请重新安装磁盘和重新启动DDR,监视引导过程。如果系统继续不启动,更换磁盘。​

​这是由于错误#58936引起,这是一个驱动程序问题。升级到DDOS 5.1.3.0或更高来修复驱动程序。​

​ ​

​注:如果您的问题仍然存在,执行本文中的步骤后,请联系您的合同支持提供商,收集AutoSupport,上传支持包(SUB)和创建支持案例。​

没有回复!
找不到事件!

Top