未解决
此帖子已超过 5 年
1.1K 消息
0
1078
【分享】DD140,160,610,620,630连续重新启动的故障排除
DD140,160,610,620,630连续重新启动的故障排除
目的
本文介绍如何解决可能是由于磁盘故障导致DD无法开机或开机循环
适用于
DD140,160,610,620,630
5.1.3.0之前的所有软件版本
原因
发生磁盘的故障
解决方案
如果DD OS系统无法启动,在单用户模式下重新启动和审查dmesg日志的磁盘错误。
查看panic警报:消息
验证panic的原因是DG0的丢失:
PANIC: dd_dgrp.c:1052 dd_dgrp_stop:: Can't tolerate essential dgrp dg0 going down
审查kern.info的命令超时的迹象:
(E4)[1321876311.967354] scsi1: Command timeout (ffff810084ad47c0) (0/5/10000)
chnl/tgt/lun 0/0/0 cdb 0x2a:395159a7:0005 Retry
(E4)[1321876311.967427] scsi cmd timed out on target 0: cdb[2a, 0,39,51, 59,a7,
0, 0, 5, 0, 0, 0].
(E4)[1321876321.960456] scsi1: Command timeout (ffff810084ad47c0) (0/5/10000)
chnl/tgt/lun 0/0/0 cdb 0x2a:395159a7:0005 Retry
(E4)[1321876321.960529] scsi cmd timed out on target 0: cdb[2a, 0,39,51, 59,a7,
0, 0, 5, 0, 0, 0].
(E4)[1321876331.953569] scsi1: Command timeout (ffff810084ad47c0) (0/5/10000)
chnl/tgt/lun 0/0/0 cdb 0x2a:395159a7:0005 Retry
(E4)[1321876331.953642] scsi cmd timed out on target 0: cdb[2a, 0,39,51, 59,a7,
0, 0, 5, 0, 0, 0].
(E4)[1321876341.946682] scsi1: Command timeout (ffff810084ad47c0) (0/5/10000)
chnl/tgt/lun 0/0/0 cdb 0x2a:395159a7:0005 Retry
(E4)[1321876341.946755] scsi cmd timed out on target 0: cdb[2a, 0,39,51, 59,a7,
0, 0, 5, 0, 0, 0].
(E4)[1321876351.939721] scsi1: Command timeout (ffff810084ad47c0) (0/5/10000)
chnl/tgt/lun 0/0/0 cdb 0x2a:395159a7:0005 Retry
(E4)[1321876351.939794] scsi cmd timed out on target 0: cdb[2a, 0,39,51, 59,a7,
0, 0, 5, 0, 0, 0].
(E4)[1321876361.932876] scsi1: Command timeout (ffff810084ad47c0) (0/5/10000)
chnl/tgt/lun 0/0/0 cdb 0x2a:395159a7:0005 Retry
(E4)[1321876361.932949] scsi cmd timed out on target 0: cdb[2a, 0,39,51, 59,a7,
0, 0, 5, 0, 0, 0].
(E4)[1321876371.925895] scsi1: Command timeout (ffff810084ad47c0) (0/5/10000)
chnl/tgt/lun 0/0/0 cdb 0x2a:395159a7:0005 Retry
(E4)[1321876371.925968] scsi cmd timed out on target 0: cdb[2a, 0,39,51, 59,a7,
0, 0, 5, 0, 0, 0].
(U0)(MSG-KERN-00018):[1321876378.494639] Kernel panic - not syncing: Watchdog
pre-timeout
确定问题的磁盘。在上面的例子 磁盘是1:0:0:0,scsi1 Channel 0 / Target 0 / LUN 0
转换为1.10 [SDM]磁盘。
搜寻kern.info, 1:0:0:0的最后一次出现:
Aug 20 12:37:18 fgnnbudd001 kernel: (E5)[ 16994582.927071] scsi 1:0:0:0 Vendor: WDC Model:
WD1002FBYS-02A6B Rev: 03.0
Aug 20 12:37:18 fgnnbudd001 kernel: (E4)[ 16994582.927661] scsi 1:0:0:0 SerialNo: MVSATA
Aug 20 12:37:18 fgnnbudd001 kernel: (E5)[ 16994582.928067] scsi 1:0:0:0 Type: Direct-Access
ANSI SCSI revision: 05
Aug 20 12:37:18 fgnnbudd001 kernel: (E5)[ 16994582.928124] scsi 1:0:0:0: Direct-Access WDC
WD1002FBYS-02A6B 03.0 PQ: 0 ANSI: 5
Aug 20 12:37:18 fgnnbudd001 kernel: (E5)[ 16994582.928613] sd 1:0:0:0: [sdm] 1953525168 512-byte
hardware sectors (1000205 MB)
Aug 20 12:37:18 fgnnbudd001 kernel: (E5)[ 16994582.928647] sd 1:0:0:0: [sdm] Write Protect is off
Aug 20 12:37:18 fgnnbudd001 kernel: (E7)[ 16994582.928666] sd 1:0:0:0: [sdm] Mode Sense: 17 00 10 00
Aug 20 12:37:18 fgnnbudd001 kernel: (E5)[ 16994582.928696] sd 1:0:0:0: [sdm] Write cache: enabled,
read cache: enabled, supports DPO and FUA
Aug 20 12:37:18 fgnnbudd001 kernel: (E5)[ 16994582.928789] sd 1:0:0:0: [sdm] 1953525168 512-byte
hardware sectors (1000205 MB)
Aug 20 12:37:18 fgnnbudd001 kernel: (E5)[ 16994582.928815] sd 1:0:0:0: [sdm] Write Protect is off
Aug 20 12:37:18 fgnnbudd001 kernel: (E7)[ 16994582.928834] sd 1:0:0:0: [sdm] Mode Sense: 17 00 10 00
Aug 20 12:37:18 fgnnbudd001 kernel: (E5)[ 16994582.928864] sd 1:0:0:0: [sdm] Write cache: enabled,
read cache: enabled, supports DPO and FUA
来自autosupport:
1.10 WDC_WD1002FBYS-02A6B0_WD-WMATV9296306 dm-1 254:16 sdm 8:192 sg10 1:0:0:0 hu:
Active ONLINE
一旦确定了发生“命令超时”的磁盘,请重新安装磁盘和重新启动DDR,监视引导过程。如果系统继续不启动,更换磁盘。
这是由于错误#58936引起,这是一个驱动程序问题。升级到DDOS 5.1.3.0或更高来修复驱动程序。
注:如果您的问题仍然存在,执行本文中的步骤后,请联系您的合同支持提供商,收集AutoSupport,上传支持包(SUB)和创建支持案例。