开始新对话

未解决

此帖子已超过 5 年

1080

2013年5月1日 20:00

【分享】DD140,160,610,620,630连续重新启动的故障排除

DD140,160,610,620,630连续重新启动的故障排除

目的
本文介绍如何解决可能是由于磁盘故障导致DD无法开机或开机循环


适用于
DD140,160,610,620,630
5.1.3.0之前的所有软件版本

原因
发生磁盘的故障

解决方案

如果DD OS系统无法启动,在单用户模式下重新启动和审查dmesg日志的磁盘错误。

查看panic警报:消息

验证panic的原因是DG0的丢失:

PANIC: dd_dgrp.c:1052 dd_dgrp_stop:: Can't tolerate essential dgrp dg0 going down

审查kern.info的命令超时的迹象:

(E4)[1321876311.967354] scsi1: Command timeout (ffff810084ad47c0) (0/5/10000)

chnl/tgt/lun 0/0/0 cdb 0x2a:395159a7:0005 Retry

(E4)[1321876311.967427] scsi cmd timed out on target 0: cdb[2a, 0,39,51, 59,a7,

0, 0,  5, 0, 0, 0].

(E4)[1321876321.960456] scsi1: Command timeout (ffff810084ad47c0) (0/5/10000)

chnl/tgt/lun 0/0/0 cdb 0x2a:395159a7:0005 Retry

(E4)[1321876321.960529] scsi cmd timed out on target 0: cdb[2a, 0,39,51, 59,a7,

0, 0,  5, 0, 0, 0].

(E4)[1321876331.953569] scsi1: Command timeout (ffff810084ad47c0) (0/5/10000)

chnl/tgt/lun 0/0/0 cdb 0x2a:395159a7:0005 Retry

(E4)[1321876331.953642] scsi cmd timed out on target 0: cdb[2a, 0,39,51, 59,a7,

0, 0,  5, 0, 0, 0].

(E4)[1321876341.946682] scsi1: Command timeout (ffff810084ad47c0) (0/5/10000)

chnl/tgt/lun 0/0/0 cdb 0x2a:395159a7:0005 Retry

(E4)[1321876341.946755] scsi cmd timed out on target 0: cdb[2a, 0,39,51, 59,a7,

0, 0,  5, 0, 0, 0].

(E4)[1321876351.939721] scsi1: Command timeout (ffff810084ad47c0) (0/5/10000)

chnl/tgt/lun 0/0/0 cdb 0x2a:395159a7:0005 Retry

(E4)[1321876351.939794] scsi cmd timed out on target 0: cdb[2a, 0,39,51, 59,a7,

0, 0,  5, 0, 0, 0].

(E4)[1321876361.932876] scsi1: Command timeout (ffff810084ad47c0) (0/5/10000)

chnl/tgt/lun 0/0/0 cdb 0x2a:395159a7:0005 Retry

(E4)[1321876361.932949] scsi cmd timed out on target 0: cdb[2a, 0,39,51, 59,a7,

0, 0,  5, 0, 0, 0].

(E4)[1321876371.925895] scsi1: Command timeout (ffff810084ad47c0) (0/5/10000)

chnl/tgt/lun 0/0/0 cdb 0x2a:395159a7:0005 Retry

(E4)[1321876371.925968] scsi cmd timed out on target 0: cdb[2a, 0,39,51, 59,a7,

0, 0,  5, 0, 0, 0].

(U0)(MSG-KERN-00018):[1321876378.494639] Kernel panic - not syncing: Watchdog

pre-timeout

确定问题的磁盘。在上面的例子 磁盘是1:0:0:0,scsi1 Channel 0 / Target 0 / LUN 0

转换为1.10 [SDM]磁盘。

搜寻kern.info, 1:0:0:0的最后一次出现:

Aug 20 12:37:18 fgnnbudd001 kernel: (E5)[  16994582.927071]   scsi 1:0:0:0 Vendor: WDC       Model:

WD1002FBYS-02A6B  Rev: 03.0

Aug 20 12:37:18 fgnnbudd001 kernel: (E4)[  16994582.927661]   scsi 1:0:0:0 SerialNo: MVSATA

Aug 20 12:37:18 fgnnbudd001 kernel: (E5)[  16994582.928067]   scsi 1:0:0:0 Type: Direct-Access

ANSI SCSI revision: 05

Aug 20 12:37:18 fgnnbudd001 kernel: (E5)[  16994582.928124] scsi 1:0:0:0: Direct-Access     WDC

WD1002FBYS-02A6B 03.0 PQ: 0 ANSI: 5

Aug 20 12:37:18 fgnnbudd001 kernel: (E5)[  16994582.928613] sd 1:0:0:0: [sdm] 1953525168 512-byte

hardware sectors (1000205 MB)

Aug 20 12:37:18 fgnnbudd001 kernel: (E5)[  16994582.928647] sd 1:0:0:0: [sdm] Write Protect is off

Aug 20 12:37:18 fgnnbudd001 kernel: (E7)[  16994582.928666] sd 1:0:0:0: [sdm] Mode Sense: 17 00 10 00

Aug 20 12:37:18 fgnnbudd001 kernel: (E5)[  16994582.928696] sd 1:0:0:0: [sdm] Write cache: enabled,

read cache: enabled, supports DPO and FUA

Aug 20 12:37:18 fgnnbudd001 kernel: (E5)[  16994582.928789] sd 1:0:0:0: [sdm] 1953525168 512-byte

hardware sectors (1000205 MB)

Aug 20 12:37:18 fgnnbudd001 kernel: (E5)[  16994582.928815] sd 1:0:0:0: [sdm] Write Protect is off

Aug 20 12:37:18 fgnnbudd001 kernel: (E7)[  16994582.928834] sd 1:0:0:0: [sdm] Mode Sense: 17 00 10 00

Aug 20 12:37:18 fgnnbudd001 kernel: (E5)[  16994582.928864] sd 1:0:0:0: [sdm] Write cache: enabled,

read cache: enabled, supports DPO and FUA

来自autosupport:

1.10  WDC_WD1002FBYS-02A6B0_WD-WMATV9296306  dm-1   254:16   sdm     8:192  sg10   1:0:0:0    hu:

Active  ONLINE

一旦确定了发生“命令超时”的磁盘,请重新安装磁盘和重新启动DDR,监视引导过程。如果系统继续不启动,更换磁盘。

这是由于错误#58936引起,这是一个驱动程序问题。升级到DDOS 5.1.3.0或更高来修复驱动程序。

    

注:如果您的问题仍然存在,执行本文中的步骤后,请联系您的合同支持提供商,收集AutoSupport,上传支持包(SUB)和创建支持案例。

没有回复!
找不到事件!

Top