Start a Conversation

Unsolved

DG

1 Rookie

 • 

10 Posts

25

October 22nd, 2023 20:57

MD3260: failed Thin virtual disk with no hints what the problem is, help!

Hello,
Really hoping for some hints as we facing 230TB recovery otherwise..
We have two MD3260 enclosures with 4 thin virtual disks that are bundled in a single 4x60T LVM volume, then XFS on it - actually 3x60TB+1x58TB and 3TB available.
We approached recently 230TB and we got critical problem, with tons of hardware errors - problem is the SMcli says that the smaller Virtual Thin Disk is problematic but without any hints and all hardware components seems to be ok (disks and controller):
SMcli -n aspera_md_1 -c 'show storagearray healthStatus;'
Performing syntax check...
Syntax check complete.
Executing script...
The following failures have been found:
Thin Virtual Disk Failed
Storage array: aspera_md_1
Disk pool: Disk_Pool_1
Thin Virtual Disk: gaia_virtual_p2
Status: Failed
On the OS side we have plenty of IO errors in the kernel:
Sun Oct 15 05:24:10 2023] sd 9:0:0:5: [sdg] tag#0 <<vendor>>ASC=0x84 ASCQ=0x0
[Sun Oct 15 05:24:10 2023] sd 9:0:0:5: [sdg] tag#0 CDB: Write(16) 8a 00 00 00 00 00 00 3d e0 f8 00 00 00 08 00 00
[Sun Oct 15 05:24:10 2023] blk_update_request: 60 callbacks suppressed
[Sun Oct 15 05:24:10 2023] blk_update_request: critical target error, dev sdg, sector 4055288
[Sun Oct 15 05:24:10 2023] sd 9:0:0:5: [sdg] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
[Sun Oct 15 05:24:10 2023] sd 9:0:0:5: [sdg] tag#1 Sense Key : Hardware Error [current]
[Sun Oct 15 05:24:10 2023] sd 9:0:0:5: [sdg] tag#1 <<vendor>>ASC=0x84 ASCQ=0x0
[Sun Oct 15 05:24:10 2023] sd 9:0:0:5: [sdg] tag#1 CDB: Write(16) 8a 00 00 00 00 00 01 23 fa f8 00 00 00 08 00 00
[Sun Oct 15 05:24:10 2023] blk_update_request: critical target error, dev sdg, sector 19135224
[Sun Oct 15 05:24:10 2023] blk_update_request: critical target error, dev dm-3, sector 4055288
[Sun Oct 15 05:24:10 2023] blk_update_request: critical target error, dev dm-3, sector 19135224
[Sun Oct 15 05:24:10 2023] XFS (dm-7): metadata I/O error in "xfs_buf_iodone_callback_error" at daddr 0x7bb2f8 len 8 error 121
[Sun Oct 15 05:24:11 2023] sd 9:0:0:5: [sdg] tag#2 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
[Sun Oct 15 05:24:11 2023] sd 9:0:0:5: [sdg] tag#2 Sense Key : Hardware Error [current]
[Sun Oct 15 05:24:11 2023] sd 9:0:0:5: [sdg] tag#2 <<vendor>>ASC=0x84 ASCQ=0x0
[Sun Oct 15 05:24:11 2023] sd 9:0:0:5: [sdg] tag#2 CDB: Write(16) 8a 00 00 00 00 05 7f ff fc 01 00 00 00 01 00 00
[Sun Oct 15 05:24:11 2023] blk_update_request: critical target error, dev sdg, sector 23622319105
[Sun Oct 15 05:24:11 2023] sd 9:0:0:5: [sdg] tag#3 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
[Sun Oct 15 05:24:11 2023] sd 9:0:0:5: [sdg] tag#3 Sense Key : Hardware Error [current]
[Sun Oct 15 05:24:11 2023] sd 9:0:0:5: [sdg] tag#3 <<vendor>>ASC=0x84 ASCQ=0x0
[Sun Oct 15 05:24:11 2023] sd 9:0:0:5: [sdg] tag#3 CDB: Write(16) 8a 00 00 00 00 05 7f ff fc 02 00 00 00 01 00 00
[Sun Oct 15 05:24:11 2023] blk_update_request: critical target error, dev sdg, sector 23622319106
[Sun Oct 15 05:24:11 2023] blk_update_request: critical target error, dev dm-3, sector 23622319105
[Sun Oct 15 05:24:11 2023] blk_update_request: critical target error, dev dm-3, sector 23622319106
[Sun Oct 15 05:24:11 2023] sd 9:0:0:5: [sdg] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
[Sun Oct 15 05:24:11 2023] sd 9:0:0:5: [sdg] tag#0 Sense Key : Hardware Error [current]
[Sun Oct 15 05:24:11 2023] sd 9:0:0:5: [sdg] tag#0 <<vendor>>ASC=0x84 ASCQ=0x0
[Sun Oct 15 05:24:11 2023] sd 9:0:0:5: [sdg] tag#0 CDB: Write(16) 8a 00 00 00 00 00 00 3d e0 f8 00 00 00 08 00 00
[Sun Oct 15 05:24:11 2023] blk_update_request: critical target error, dev sdg, sector 4055288
[Sun Oct 15 05:24:11 2023] sd 9:0:0:5: [sdg] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
[Sun Oct 15 05:24:11 2023] sd 9:0:0:5: [sdg] tag#1 Sense Key : Hardware Error [current]
[Sun Oct 15 05:24:11 2023] sd 9:0:0:5: [sdg] tag#1 <<vendor>>ASC=0x84 ASCQ=0x0
[Sun Oct 15 05:24:11 2023] sd 9:0:0:5: [sdg] tag#1 CDB: Write(16) 8a 00 00 00 00 00 01 23 fa f8 00 00 00 08 00 00
[Sun Oct 15 05:24:11 2023] blk_update_request: critical target error, dev sdg, sector 19135224
[Sun Oct 15 05:24:11 2023] sd 9:0:0:5: [sdg] tag#2 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
[Sun Oct 15 05:24:11 2023] sd 9:0:0:5: [sdg] tag#2 Sense Key : Hardware Error [current]
[Sun Oct 15 05:24:11 2023] sd 9:0:0:5: [sdg] tag#2 <<vendor>>ASC=0x84 ASCQ=0x0
[Sun Oct 15 05:24:11 2023] sd 9:0:0:5: [sdg] tag#2 CDB: Write(16) 8a 00 00 00 00 05 7f ff fc 01 00 00 00 02 00 00
[Sun Oct 15 05:24:11 2023] sd 9:0:0:5: [sdg] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
[Sun Oct 15 05:24:11 2023] sd 9:0:0:5: [sdg] tag#0 Sense Key : Hardware Error [current]
[Sun Oct 15 05:24:11 2023] sd 9:0:0:5: [sdg] tag#0 <<vendor>>ASC=0x84 ASCQ=0x0
[Sun Oct 15 05:24:11 2023] sd 9:0:0:5: [sdg] tag#0 CDB: Write(16) 8a 00 00 00 00 00 00 3d e0 f8 00 00 00 08 00 00
[Sun Oct 15 05:24:11 2023] sd 9:0:0:5: [sdg] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
[Sun Oct 15 05:24:11 2023] sd 9:0:0:5: [sdg] tag#1 Sense Key : Hardware Error [current]
It seems that VD size id 60TB but actual size of the thin VD is 58TB..
We tried several things:
1. increasing the size of the repo associated with this thin vd, but it errors out (we have one spare that could be used for this). There are no details shown with the error (SMClient)
2. Run the test, with the error below...
SMcli -n aspera_md_1 -c 'check virtualDisk [gaia_virtual_p2] repositoryConsistency file="/tmp/p2_consistency.txt";'
Performing syntax check...
Syntax check complete.
Executing script...
Script execution complete.
SMcli completed successfully.
[root@gaiaftp01: disks]$ less /tmp/p2_consistency.txt
ThinRepo 0xc002 State:FAILED RollbackState:NONE
Image 5(xc002) validation starting
Error: Dir2CN:0x000000440bbb(Current) Bad signature
Error: Dir2CN:0x000000440bbb(Current) Bad Level Found:0x47 Expected:0x32
Error: Dir2CN:0x000000440bbb(Current) Bad Location DirA:0x5a3c806b33570a1d DirB:0xdb8e591145f3a3a2
Error: Dir2CN:0x000000440bbb(Current) Cluster out of bounds DataCN:0x67fa78c873cc Offset:0x30
Dumping Dir Level 2 (Current) CN:x000000440bbb LBA:x0002205dd80
0x0000 | ...].{m8..3Gr... | 7F94AB5D 987B6D38 DF1B3347 72CCF7E0 |
0x0010 | ..W3k.<Z...E.Y.. | 1D0A5733 6B803C5A A2A3F345 11598EDB |
0x0020 | ...*R....}....h. | E0A4D72A 52CEB290 B87DB292 E6C9689C |
0x0030 | .s.x.g{.O@]..pIk | CC73C878 FA677BCE 4F405DA5 9670496B |
0x0040 | .^.n....1wY..... | BC5E076E FC8DAA10 31775915 B20308B8 |
0x0050 | .7j.(i.4...yme.. | 99376AF2 28698234 84E0F879 6D65F112 |
0x0060 | ..Uf].Eo.V.....e | DFED5566 5DED456F C356D485 10C19565 |
0x0070 | ...Y..yK.*...i.. | F4939759 0181794B 9E2AB1E6 F069EFE9 |
0x0080 | N..&........T..f | 4E8BCE26 04010A91 819DDEB8 54DF9B66 |
0x0090 | (<...>...m...@.f | 283CFA09 FB3E961D 976D9CCB 1040DB66 |
0x00a0 | ~........;...... | 7EDC07E8 07ECE2E9 C43BEDFE FBBFD2B1 |
0x00b0 | ......&.uo5.M5m. | 8DCBEDDA F1F62692 756F35FE 4D356D8A |
0x00c0 | .....G.....2.... | E5F49EE1 F347C5F5 BC9ED632 E8C5B815 |
0x00d0 | H6{.lv...J..<.]. | 48367B18 6C76DD19 B84AA887 3CA05DA1 |
0x00e0 | ....Bc`....5V].. | 83D7D00D 42636018 C89CAF35 565DDD1F |
0x00f0 | ....X..B.F/.V..1 | 9DA4A4F5 581AFE42 F1462FB9 56A1CB31 |
0x0100 | '.{"...E..).."0. | 27EC7B22 7FD3C945 1FC229A0 0322301D |
0x0110 | ..CF..(...N..Wx. | 041A4346 D18D2894 CAFA4EC1 8B577817 |
0x0120 | 3..5Z$.!.j#JSf.} | 33FCCA35 5A240821 9A6A234A 5366E27D |
0x0130 | .......j..-..... | A49A1FBE ED0BA56A A6E32DB2 D4AD07E9 |
0x0140 | .I..z;..'.,..Zr. | DD49B38A 7A3B97B7 27972CA3 0B5A7204 |
0x0150 | .^t..)...6..n.'. | F45E74D4 8229180C 0A3682C3 6E0B27B2 |
0x0160 | .A.%3G..*AD..;.) | 8A41B125 3347B995 2A4144F8 C23BCF29 |
0x0170 | ..-.....*...N.:. | 1F8C2DBB B2078718 2AB3EBEA 4EC63AF1 |
0x0180 | .N.~...-y..`.5.. | 8D4E017E 82C2852D 79DDEB60 C535D4EA |
0x0190 | ,.....Q.).....V+ | 2CCBEC04 12C451A1 29C8D004 BF88562B |
0x01a0 | ..K>.....j....zl | B1824B3E 7FE9D4C2 846AA094 E7157A6C |
0x01b0 | ..S...Vx...K...a | 0EDB53A3 BD985678 BAF8BA4B 149E8661 |
0x01c0 | .."...+.J....... | FE032203 E2B82BD6 4AAAB813 88AFDAE5 |
0x01d0 | ....hi.RQ.C..." | 0DEDB0EF 6869E252 51DC43BD DC1B2220 |
0x01e0 | ....0.).../+.Jv{ | F6BDFEE6 30AE29FA 8E022F2B A44A767B |
0x01f0 | .^.p.G%,.~.).Su. | 155ECB70 F047252C 117E9229 905375A6 |
Directory Structure - Dir Level 0 CN:x000000000005 Offset:x0
Dir Level 1 CN:x000000000007 Offset:x72
Dir Level 2 CN:x000000440bbb
Image 5(xc002) validation failed with 4 errors 0 warnings
Validation error limit reached, stopping validation
ThinRepo 0xc002 validation failed with 4 errors 0 warnings
Please help as we are really out of ideas. The logical try was to expand thin v-disk repo, but it errored out - also no hints why.. Is there any chance to make it back alive?
The deeper problem is that we do not know if it's really a non-reported hardware error or not as we would not like to dump the enclosures - but we cannot be sure the same situation does not resurface if we delete the failed VD (and loose all 4 VDs LVM/XFS  volume)..
Regards,
Chris
No Responses!
No Events found!

Top