PowerEdge HDD/SCSI/RAID

Last reply by 05-27-2022 Solved
Start a Discussion
2 Bronze
2 Bronze
212

Recover RAID5 volume split into two foreign configurations

We had a problem with a power cable, causing a lot of disks to disappear temporarily. After we fixed it, most volume groups came online again, but a 7 disk RAID5 volume did not, it was instead split into two foreign configs (see below).

That is, all 7 disks of the volume are available, but the controller thinks 5 of them are in one volume group, and the remaining disks are in the other two (this must be clearly a firmware bug, as neither group could have been degraded). How can I recover from this situation and merge the two groups into the single group that they are?

 

Foreign Topology :
================

----------------------------------------------------------------------------
DG Arr Row EID:Slot DID Type State BT Size PDC PI SED DS3 FSpace TR
----------------------------------------------------------------------------
0 - - - - RAID5 Frgn N 54.571 TB dsbl N N dflt N N
0 0 - - - RAID5 Frgn N 54.571 TB dsbl N N dflt N N
0 0 0 :0 0 DRIVE Frgn N 9.094 TB dsbl N N dflt - N
0 0 1 :1 1 DRIVE Frgn N 9.094 TB dsbl N N dflt - N
0 0 2 :2 2 DRIVE Frgn N 9.094 TB dsbl N N dflt - N
0 0 3 :3 3 DRIVE Frgn N 9.094 TB dsbl N N dflt - N
0 0 4 - - DRIVE Msng - 9.094 TB - - - - - N
0 0 5 - - DRIVE Msng - 9.094 TB - - - - - N
0 0 6 :5 5 DRIVE Frgn N 9.094 TB dsbl N N dflt - N
1 - - - - RAID5 Frgn N 54.571 TB dsbl N N dflt N N
1 0 - - - RAID5 Frgn N 54.571 TB dsbl N N dflt N N
1 0 0 - - DRIVE Msng - 9.094 TB - - - - - N
1 0 1 - - DRIVE Msng - 9.094 TB - - - - - N
1 0 2 - - DRIVE Msng - 9.094 TB - - - - - N
1 0 3 - - DRIVE Msng - 9.094 TB - - - - - N
1 0 4 :6 6 DRIVE Frgn N 9.094 TB dsbl N N dflt - N
1 0 5 :4 4 DRIVE Frgn N 9.094 TB dsbl N N dflt - N
1 0 6 - - DRIVE Msng - 9.094 TB - - - - - N
----------------------------------------------------------------------------


Foreign VD List :
===============

---------------------------------
DG VD Size Type Name
---------------------------------
0 255 54.571 TB RAID5 RV5
1 255 54.571 TB RAID5 RV5
---------------------------------

Update:

 

Forgot to mention, this is with an Perc H740P controller

Solution (1)

Accepted Solutions
179

Hi @RRcvr,

 

For your situation, it's weird to have foreign configuration to contain not the original configuration. For this, the support need to analyze the PERC log to confirm the situation to proceed for next course of action. I am unable to provide a suggested step, due to that it might cause data lost, therefore, the support need to analyze the logs. Do contact the support with the logs ready. 

 

https://dell.to/3lM7x5S


DELL-Joey C
Social Media and Communities Professional
Dell Technologies | Enterprise Support Services
#IWork4Dell

Did I answer your query? Please click on ‘Accept as Solution’. ‘Kudo’ the posts you like!

View solution in original post

Replies (3)
180

Hi @RRcvr,

 

For your situation, it's weird to have foreign configuration to contain not the original configuration. For this, the support need to analyze the PERC log to confirm the situation to proceed for next course of action. I am unable to provide a suggested step, due to that it might cause data lost, therefore, the support need to analyze the logs. Do contact the support with the logs ready. 

 

https://dell.to/3lM7x5S


DELL-Joey C
Social Media and Communities Professional
Dell Technologies | Enterprise Support Services
#IWork4Dell

Did I answer your query? Please click on ‘Accept as Solution’. ‘Kudo’ the posts you like!

166

Hi, and thanks for the reply.

In the meantime, I connected the disks to a HBA and analyzed the ddf headers (the fact that perc controllers partially follow an open standard here was very helpful). The reason for the two foreign configurations is indeed that the h740p has split the raid5 volume into two separate volumes with separate GUIDs in separate contasiners and so on. I was able to format the volumes in GNU/Linux in DDF format and linux was able to read the data, so no data loss has occured (verified by a btrfs scrub, which checksums all data and metadata blocks and found no corruption).

On the other hand, this clearly confirms a serious firmware bug in the controller, as under no circumstances it should behave like this. My theory is that this was a race condition: during the problem, groups of disks fell out of the controller and came back quickly, so likely the controller got confused when it was writing the new ddf headers to disks while groups of other disks were newly detected and/or got disconnected.

Unfortunately, the dell perc controller does not support the snia ddf standard and therefore can't recognize those disks, so it does not detect the raid volume at all. I will try to recreate the volume with the correct disks in the correct order and hope that a background initialisation will not destroy the data, otherwise I will face a rather lengthy restore process.

A controller log will be useless, as the system was rbeooted and repowered multiple times before the problem was even detected.

Just for information, this is the partial dump of the ddf headers for the disks, showing the corruption:

/dev/sdf
refno 66fee9c8
guid 'ATA 999901019c64177c25b6'
pd 1 6d67850c 'ATA 9999010198734b845e34'
pd 2 2c442eef 'ATA 99990101a3ff6b169fb3'
pd 3 859c2a72 'ATA 9999010140f57d7b1911'
pd 4 2a25447d 'ATA 9999010181a40ea27a38'
pd 5 6db9e402 'SmrtStor P1tf¸'
pd 6 0176ebaa 'ATA 99990101bd73575777e4'
pd 7 a63ba301 'ATA 999901017d605c6aadf6'
pd 8 5254f474 'ATA 999901014ecf2257f8f4'
pd 9 80e8a86d 'ATA 999901014c775ca92a87'
pd 10 49416c50 'ATA 99990101d79cd13a1e1e'
pd 11 fa44428b 'ATA 9999010198bd2187a552'
pd 12 66fee9c8 'ATA 999901019c64177c25b6'
pd 13 4a94daa9 'ATA 99990101679d1776307e'
part 0
guid 'Dell '
size 117190950912
blocks 19531825152
disk 0 start 0 ref a63ba301
disk 1 start 0 ref 5254f474
disk 2 start 0 ref 80e8a86d
disk 3 start 0 ref 49416c50
disk 4 start 0 ref fa44428b
disk 5 start 0 ref 66fee9c8
disk 6 start 0 ref 4a94daa9

/dev/sdg
refno fa44428b
guid 'ATA 9999010198bd2187a552'
pd 1 6d67850c 'ATA 9999010198734b845e34'
pd 2 2c442eef 'ATA 99990101a3ff6b169fb3'
pd 3 859c2a72 'ATA 9999010140f57d7b1911'
pd 4 2a25447d 'ATA 9999010181a40ea27a38'
pd 5 6db9e402 'SmrtStor P1tf¸'
pd 6 0176ebaa 'ATA 99990101bd73575777e4'
pd 7 a63ba301 'ATA 999901017d605c6aadf6'
pd 8 5254f474 'ATA 999901014ecf2257f8f4'
pd 9 80e8a86d 'ATA 999901014c775ca92a87'
pd 10 49416c50 'ATA 99990101d79cd13a1e1e'
pd 11 fa44428b 'ATA 9999010198bd2187a552'
pd 12 66fee9c8 'ATA 999901019c64177c25b6'
pd 13 4a94daa9 'ATA 99990101679d1776307e'
part 0
guid 'Dell '
size 117190950912
blocks 19531825152
disk 0 start 0 ref a63ba301
disk 1 start 0 ref 5254f474
disk 2 start 0 ref 80e8a86d
disk 3 start 0 ref 49416c50
disk 4 start 0 ref fa44428b
disk 5 start 0 ref 66fee9c8
disk 6 start 0 ref 4a94daa9

/dev/sdh
refno 4a94daa9
guid 'ATA 99990101974a122c9311'
pd 1 6d67850c 'ATA 99990101be1d53ed8c7d'
pd 2 2c442eef 'ATA 99990101ff58714b7f1b'
pd 3 859c2a72 'ATA 99990101fa3ac0b94ef7'
pd 4 2a25447d 'ATA 999901017e74d11eb6e6'
pd 5 0176ebaa 'ATA 99990101f19b3355ec56'
pd 6 a63ba301 'ATA 99990101f391d36e91f9'
pd 7 5254f474 'ATA 99990101fa6d3d5b6c49'
pd 8 80e8a86d 'ATA 99990101b7ad5947d5c0'
pd 9 49416c50 'ATA 99990101d2e6918871bb'
pd 10 4a94daa9 'ATA 99990101974a122c9311'
pd 11 6db9e402 'SmrtStor P1tf¸'
part 0
guid 'Dell '
size 117190950912
blocks 19531825152
disk 0 start 0 ref a63ba301
disk 1 start 0 ref 5254f474
disk 2 start 0 ref 80e8a86d
disk 3 start 0 ref 49416c50
disk 6 start 0 ref 4a94daa9

/dev/sdi
refno 49416c50
guid 'ATA 99990101d2e6918871bb'
pd 1 2a25447d 'ATA 999901017e74d11eb6e6'
pd 2 0176ebaa 'ATA 99990101f19b3355ec56'
pd 3 49416c50 'ATA 99990101d2e6918871bb'
pd 4 6db9e402 'SmrtStor P1tf¸'
part 0
guid 'Dell '
size 117190950912
blocks 19531825152
disk 3 start 0 ref 49416c50

/dev/sdk
refno 80e8a86d
guid 'ATA 99990101b7ad5947d5c0'
pd 1 2a25447d 'ATA 999901017e74d11eb6e6'
pd 2 0176ebaa 'ATA 99990101f19b3355ec56'
pd 3 a63ba301 'ATA 99990101f391d36e91f9'
pd 4 5254f474 'ATA 99990101fa6d3d5b6c49'
pd 5 80e8a86d 'ATA 99990101b7ad5947d5c0'
pd 6 49416c50 'ATA 99990101d2e6918871bb'
pd 7 6db9e402 'SmrtStor P1tf¸'
part 0
guid 'Dell '
size 117190950912
blocks 19531825152
disk 0 start 0 ref a63ba301
disk 1 start 0 ref 5254f474
disk 2 start 0 ref 80e8a86d
disk 3 start 0 ref 49416c50

/dev/sdl
refno 5254f474
guid 'ATA 99990101fa6d3d5b6c49'
pd 1 2a25447d 'ATA 999901017e74d11eb6e6'
pd 2 0176ebaa 'ATA 99990101f19b3355ec56'
pd 3 a63ba301 'ATA 99990101f391d36e91f9'
pd 4 5254f474 'ATA 99990101fa6d3d5b6c49'
pd 5 80e8a86d 'ATA 99990101b7ad5947d5c0'
pd 6 49416c50 'ATA 99990101d2e6918871bb'
pd 7 6db9e402 'SmrtStor P1tf¸'
part 0
guid 'Dell '
size 117190950912
blocks 19531825152
disk 0 start 0 ref a63ba301
disk 1 start 0 ref 5254f474
disk 2 start 0 ref 80e8a86d
disk 3 start 0 ref 49416c50

/dev/sdm
refno a63ba301
guid 'ATA 99990101f391d36e91f9'
pd 1 2a25447d 'ATA 999901017e74d11eb6e6'
pd 2 0176ebaa 'ATA 99990101f19b3355ec56'
pd 3 a63ba301 'ATA 99990101f391d36e91f9'
pd 4 5254f474 'ATA 99990101fa6d3d5b6c49'
pd 5 80e8a86d 'ATA 99990101b7ad5947d5c0'
pd 6 49416c50 'ATA 99990101d2e6918871bb'
pd 7 6db9e402 'SmrtStor P1tf¸'
part 0
guid 'Dell '
size 117190950912
blocks 19531825152
disk 0 start 0 ref a63ba301
disk 1 start 0 ref 5254f474
disk 2 start 0 ref 80e8a86d
disk 3 start 0 ref 49416c50

156

And whoever reads this in the future in a similar situation, I failed to get the order right a few times, but the takeway information is that when using perccli (background initialisation!), creating and deleting arrays does apparently NOT erase the data on it (maybe it erases the partition table, i.e. the first few blocks, when you use "del force", but I didn't have this case).

And with raid5, it should be safe to background-initialise a disk even when stripe size or disk order is wrong, as long as the disks are the correct disks and the number of disks stays the same, so you can try recovery as many times as you like. This is not true for raid6 and probably not for other modes.

Also, this is certainly only true for background initialisation - foreground initialisation e.g. in the firmware menu will most certainly zero all data on the disks.

Of course, experimenting is easier if you have a backup...

Latest Solutions
Top Contributor