Unsolved

1 Rookie

 • 

13 Posts

2634

September 20th, 2022 19:00

Array still Initializing after energy lost PS4210

Hi,

I'm having problems with a Storage Server Dell PS4210, two weeks ago there was a general blackout and the storage was completely turned off. The configuration is three ps4210 servers, one master and two slaves. Where everyone got up except the master.

After running some commands to verify the array I get the following:

CLI> support exec "raidtool"
Driver Status: Admin Intervention Requested
RAID LUN 0 Degraded.
raid status unrecoverable.
11 Drives (0,2,4,6,8,1,3,5,7,f,f)
RAID 6 (64KB sectionPerSU)
Capacity 17,617,013,637,120 bytes
Available Drives List: 9

It stayed in "storage array still initializing" 10 days ago... please urgent help.

6 Operator

 • 

1.5K Posts

 • 

6.4K Points

September 22nd, 2022 16:00

Hello, 

  Once the cache is discarded the array should finish the boot up in a minute or less. 

 Regards, 

Don

 

#iworkfordell

6 Operator

 • 

1.5K Posts

 • 

6.4K Points

September 24th, 2022 06:00

Hello, 

  re: commands.  That depends on the filesystem being used by each volume.  I.e. Windows it's chkdsk, Linux it will also depend on the filesystem.  VMware has VOMA, etc...   It's not something you can do on the EQL array. 

   You're group appears to be is dire trouble.  With faults or issues on each member.   Your first priority should be to make sure what data is available is backed up. 

  Re: Lost pages.  There is no way to resolve that issue via this forum.  It requires that the diagnostics from each member be collected and analyzed.  Then engineering help is needed to get those volumes back online.     For that you would need a support contact.    If pages are really missing then some data loss should be expected.  More commonly the error is "missing or duplicate pages"   It's possible to find the duplicate page and bring the volume back online.  again, all of this requires a support contract.  

 Some regions will let you buy a certain number of hours.   But I am not sure that would allow the access to engineering required for this issue. 

 At the CLI  please run:    show member 

Regards, 

Don

#iworkfordell

1 Rookie

 • 

13 Posts

September 24th, 2022 06:00

Hello,

First of all, I would like to thank each person who spent some time answering my queries and trying to find a solution. We have executed the "clearlostdata" command successfully and now the Arrays are up, however I have two questions, the first is what commands do you recommend me to execute to verify the system or files and on the other hand I have a Volume that cannot be lifted that says " offline - missing pages" (image attached).

WhatsApp Image 2022-09-24 at 8.59.50 AM.jpeg

If you could help me I would be grateful in advance.

Cheers,
Thank you

 

1 Rookie

 • 

13 Posts

September 24th, 2022 06:00

Hi,GRP1> show member
Name Status Version Disks Capacity FreeSpace Connections
---------- ------- ---------- ----- ---------- ---------- -----------
EQL1-PS421 online V8.1.6 (R4 11 15.97TB 3.38TB 32
0E 27114)
EQL3-PS621 online V8.1.6 (R4 24 33.74TB 7.06TB 32
0E 27114)
EQL2-PS621 online V8.1.6 (R4 24 33.74TB 7.06TB 32
0E 27114)
GRP1>

6 Operator

 • 

1.5K Posts

 • 

6.4K Points

September 24th, 2022 07:00

Hello, 

 Thank you.  I was wondering how old the firmware was.  That's quite a few revisions behind.   You might want to inquire if you can still get a support contract for those arrays. 

  Regards, 

 Don 

 

#iworkfordell 

6 Operator

 • 

1.5K Posts

 • 

6.4K Points

September 24th, 2022 18:00

Hello, 

 I am curious, are the errors on the other members also related to failed or failing cache batteries? 

 I believe I mentioned it before but if both controllers have bad batteries the array won't boot.  At least one has to have a functioning battery.   Be careful where you source them.  The batteries have an EOL date.  Some people here in the forum have reported getting batteries only to have them fail either immediately or shortly after installation.  If at all possible try to find a vendor selling NEW battery assemblies. 

 Regards,

Don

#iworkfordell

1 Rookie

 • 

13 Posts

September 26th, 2022 05:00

Hi,

What I know is that in one CM they change just the batteries with another generic, not the chip just the batteries. What maybe is happening is that the batteries has a Kernel or something that tells the storage Im new or original. Because of all the volumes just Two are not up.

 

Thanks

6 Operator

 • 

1.5K Posts

 • 

6.4K Points

September 26th, 2022 05:00

Hello, 

  There is a date coded on the board, so I don't believe just replacing the battery will work.

However, that's not what is causing the missing pages problem that is preventing the volumes from coming online.  If so, then all volumes would be offline.   Each member keeps track of the blocks is has.  They are organized in larger pages.  When multiple members are in the same pool, the volumes are striped across them in proportion to their relative sizes.  Over time the members need to balance pages between the members to keep free space balanced and for performance to prevent hot spots.   The page movement is a background process. If something interrupts it, like a failed member or losing cache, the page count between the members can be lost. Since the volumes are striped if a single member goes down or the page database for these pages is not correct, all affected volumes are put offline until that issue has been resolved. 

 Which for missing or duplicate pages requires support from Dell directly. 

 Regards, 

Don 

#iworkfordell

No Events found!

Top