Start a Conversation

Unsolved

Closed

J

22 Posts

1069

April 20th, 2023 04:00

PS6500 "not configured" after power outage

We had a short 30 mins outage today, but SAN not responding to any connections except via serial.

Only one of the controllers is responding, am able to login with grpadmin but seems like it lost all other config data?

"It appears that the storage array has not been configured. Please run setup before executing management commands"

22 Posts

April 20th, 2023 04:00

Some commands from another thread

CLI> support exec "uname -a"
You are running a support command, which is normally restricted to PS Series Tec
hnical Support personnel. Do not use a support command without instruction from
Technical Support.
NetBSD 1.6.2 NetBSD 1.6.2 (EQL.PSS) #0: Wed Apr 16 11:33:43 EDT 2014 build@m64:/buildarea/V6.0.10__Wed_Apr_16_2014_11_32_34_EDT/bin/destdir.sbmips.release/EQL.PSS xlrmips
CLI> support exec "raidtool"
You are running a support command, which is normally restricted to PS Series Tec
hnical Support personnel. Do not use a support command without instruction from
Technical Support.
Driver Status: *Admin Intervention Requested*
RAID LUN 0 Ok.
raid status unrecoverable.
12 Drives (37,8,29,13,45,12,4,20,1,5,9,17)
RAID 6 (64KB sectPerSU)
Capacity 29,576,400,076,800 bytes
RAID LUN 1 Ok.
raid status unrecoverable.
10 Drives (44,25,21,16,24,28,32,36,40,41)
RAID 6 (64KB sectPerSU)
Capacity 23,661,120,061,440 bytes
Available Drives List: 0,33

CLI> support exec "diskview -j"
You are running a support command, which is normally restricted to PS Series Tec
hnical Support personnel. Do not use a support command without instruction from
Technical Support.
Enc/Drive State Write Read Power Drive Bad ForceWrite Reset Read Scan Max Max
Retrys Retrys Cycles Timeouts Blocks Retrys Fail Timeout Errors Cominits HrstMsecs
______________________________________________________________________________________________________________________
0/ 0 Online 0 0 0 0 0 0 0 0 0 0 0
0/ 1 Online 0 0 0 0 0 0 0 0 0 0 0
0/ 2 Slot Empty 0 0 0 0 0 0 0 0 0 0 0
0/ 3 Slot Empty 0 0 0 0 0 0 0 0 0 0 0
0/ 4 Online 0 0 0 0 0 0 0 0 0 0 0
0/ 5 Online 0 0 0 0 0 0 0 0 0 0 0
0/ 6 Slot Empty 0 0 0 0 0 0 0 0 0 0 0
0/ 7 Slot Empty 0 0 0 0 0 0 0 0 0 0 0
0/ 8 Scanning 0 0 0 0 0 0 0 0 0 0 0
0/ 9 Online 0 0 0 0 0 0 0 0 0 0 0
0/10 Slot Empty 0 0 0 0 0 0 0 0 0 0 0
0/11 Slot Empty 0 0 0 0 0 0 0 0 0 0 0
0/12 Online 0 0 0 0 0 0 0 0 0 0 0
0/13 Online 0 0 0 0 0 0 0 0 0 0 0
0/14 Slot Empty 0 0 0 0 0 0 0 0 0 0 0
0/15 Slot Empty 0 0 0 0 0 0 0 0 0 0 0
0/16 Online 0 0 0 0 0 0 0 0 0 0 0
0/17 Online 0 0 0 0 0 0 0 0 0 0 0
0/18 Slot Empty 0 0 0 0 0 0 0 0 0 0 0
0/19 Slot Empty 0 0 0 0 0 0 0 0 0 0 0
0/20 Online 0 0 0 0 0 0 0 0 0 0 0
0/21 Online 0 0 0 0 0 0 0 0 0 0 0
0/22 Slot Empty 0 0 0 0 0 0 0 0 0 0 0
0/23 Slot Empty 0 0 0 0 0 0 0 0 0 0 0
0/24 Online 0 0 0 0 0 0 0 0 0 0 0
0/25 Online 0 0 0 0 0 0 0 0 0 0 0
0/26 Slot Empty 0 0 0 0 0 0 0 0 0 0 0
0/27 Slot Empty 0 0 0 0 0 0 0 0 0 0 0
0/28 Online 0 0 0 0 0 0 0 0 0 0 0
0/29 Online 0 0 0 0 0 0 0 0 0 0 0
0/30 Slot Empty 0 0 0 0 0 0 0 0 0 0 0
0/31 Slot Empty 0 0 0 0 0 0 0 0 0 0 0
0/32 Online 0 0 0 0 0 0 0 0 0 0 0
0/33 Online 0 0 0 0 0 0 0 0 0 0 0
0/34 Slot Empty 0 0 0 0 0 0 0 0 0 0 0
0/35 Slot Empty 0 0 0 0 0 0 0 0 0 0 0
0/36 Online 0 0 0 0 0 0 0 0 0 0 0
0/37 Online 0 0 0 0 0 0 0 0 0 0 0
0/38 Slot Empty 0 0 0 0 0 0 0 0 0 0 0
0/39 Slot Empty 0 0 0 0 0 0 0 0 0 0 0
0/40 Online 0 0 0 0 0 0 0 0 0 0 0
0/41 Online 0 0 0 0 0 0 0 0 0 0 0
0/42 Slot Empty 0 0 0 0 0 0 0 0 0 0 0
0/43 Slot Empty 0 0 0 0 0 0 0 0 0 0 0
0/44 Online 0 0 0 0 0 0 0 0 0 0 0
0/45 Online 0 0 0 0 0 0 0 0 0 0 0
0/46 Slot Empty 0 0 0 0 0 0 0 0 0 0 0
0/47 Slot Empty 0 0 0 0 0 0 0 0 0 0 0

CLI> support exec "raidview"
You are running a support command, which is normally restricted to PS Series Tec
hnical Support personnel. Do not use a support command without instruction from
Technical Support.
Driver Status: *Admin Intervention Requested*
RAID LUN 0 Ok.
raid status unrecoverable.
12 Drives (37,8,29,13,45,12,4,20,1,5,9,17)
RAID 6 (64KB sectPerSU)
Capacity 29,576,400,076,800 bytes
RAID LUN 1 Ok.
raid status unrecoverable.
10 Drives (44,25,21,16,24,28,32,36,40,41)
RAID 6 (64KB sectPerSU)
Capacity 23,661,120,061,440 bytes
Available Drives List: 0,33

22 Posts

April 20th, 2023 04:00

Trying to restart gracefully but afraid of proceeding further as the other controller seems to  be dead and unsure what will happen if it tries to failover to that one

CLI> restart


After you enter the restart command, the active control module will fail over.
To continue to use the serial connection when the array restarts, connect the
serial cable to the new active control module.

Do you really want to restart the system? (yes/no) [no]:

22 Posts

April 20th, 2023 04:00

Output from "show". Seems like one of the controllers is dead? Is it safe to run "reboot/shutdown"?

 

_____________________________ Member Information ______________________________
Model: 70-0300 Serial-Number:
Disks: 24 Spares: 2
Controllers: 1 CacheMode: write-back
Service Tag:
_______________________________________________________________________________


_________________________________ Controllers _________________________________

ID Model CM Rev. SW Rev. Statu
-- ---------- ---------- --------------------------------------------- -----
0 70-0300 A05 Storage Array Firmware V6.0.10 (R390548) ok


_______________________________ Ethernet Ports ________________________________

Name ifType ifSpeed Mtu Ipaddress Status Errors
---------- --------------- ---------- ----- --------------- ------- -------
eth0 ethernet-csmacd 4294967295 9000

 

_____________________________ Temperature Sensors _____________________________

Name Value Normal Range Status
----------------------------------- --------- --------------- ---------------
Control module 0 processor 89 10-75 critical
Control module 1 processor 0 10-75 unknown
Control module 0 SAS Controller 89 10-110 normal
Control module 1 SAS Controller 0 10-110 unknown
Control module 0 Battery 40 10-45 normal
Control module 1 Battery 0 10-45 unknown


____________________________________ Fans _____________________________________

Name Speed Normal Range Status
------------------------------ --------- --------------- ---------------
Power Cooling Module 0 Fan 0 4680 2100-5350 normal
Power Cooling Module 0 Fan 1 4770 2100-5350 normal
Power Cooling Module 1 Fan 0 4620 2100-5350 normal
Power Cooling Module 1 Fan 1 4770 2100-5350 normal
Power Cooling Module 2 Fan 0 4680 2100-5350 normal
Power Cooling Module 2 Fan 1 4620 2100-5350 normal


_______________________________ Power Supplies ________________________________

Name Status FanStatus
---------------------------------------- -------- ---------------
Power Cooling Module 0 on not-applicable
Power Cooling Module 1 on not-applicable
Power Cooling Module 2 on not-applicable

Moderator

 • 

631 Posts

April 20th, 2023 12:00


Justj0sh,

 

I wouldn't recommend shutting it down. Would you do me a favor and confirm if the data is stored in another location or on this system, I ask to confirm if we can just wipe it out and refill. What I would suggest is that you call into support, as it will take direct access to the device to have a chance at recovery, as another issue is that you are on the V6 fw, which had an urgent replacement issued due to instability. 

 

22 Posts

April 20th, 2023 12:00

We have backups but we just looked and we are missing some crucial changes in the backup versions so ideally we want the data that's resting on this EQL. 

 

Is there anything else I can run on my end? The data should be saved on the disks not the controller no? Can I just force a reload of the config into the controller?

Moderator

 • 

631 Posts

April 20th, 2023 13:00

I wouldnt recommend doing anything at this point, until support looks at it directly,  as anything done risks the data. 

22 Posts

April 20th, 2023 15:00

We don't have a support contract on this machine and we are also on the way out of the EQL ecosystem. What odds do you put on support being able to restore the data? I'm not entirely sure we can get approval to commit for a 50% chance which means we'll probably have to accept the loss.

Assuming the default scenario is to write off the device, do you have any suggestions that might work out and increase the odds from 0% to 10-20%?

Moderator

 • 

631 Posts

April 20th, 2023 23:00

Honestly, it is very difficult to answer your question, what you have been advised is the only thing to do in order not to permanently lose the data

1 Rookie

 • 

1.5K Posts

April 22nd, 2023 20:00

Hello, 

 Just to confirm this is a 24 drive configuration correct? 

  The RAIDsets are in a lost cache condition.  

  Assuming it is a 24 drive configuration, AND you have backups there are two things you can try. 

 However, you may have already lost some data.  You can run a command to ignore the missing cache, and possibly the array will boot up.   It is possible that the missing data is critical to the array configuration.  Which would leave the array offline    

 Option 1:  Safe and not difficult to do.   Power down the array.  Wait 2 mins.  Pull the active controller out about 1/2 or so.  Connect the serial port to the other controller.   Power on the array..   *IF* the missing cache is in that controller it will flush it to disk and the array should boot up.   If you are in the same condition you are now. Use  CLI>support exec raidtool   to check.   Then you can try option 2 

Option 2:  Risky but often works.   There is a command to clear the cache.  It's documented in the CLI guide.  At the CLI> prompt  type clearlostdata and hit enter.   If it worked well in a couple minutes the array should be back online.   Again there is a risk that this could make things worse.   A rebuild of the internal DB would likely be needed.  Depending on where you are located,  you might be able to open a one time support call for a fee.    Otherwise you would have to reset the array and restore from backup. 

If it's NOT a 24 drive configuration, then DO NOT RUN clearlostdata  

If the array does come back, you should run a filesystem check on all your volumes.  If running VMs check the virtual disks inside the VM OS.  

 Regards, 

Don 

 

#iworkfordell 

 

1 Rookie

 • 

1.5K Posts

April 22nd, 2023 21:00

Hello, 

 Re: active CM.  The one you are currently connected to is the active.  If it was not then you would not see any of the disks or raidset. 

 re:  raidtool.   Look for the same messages as before "admin interention required"   If they are still there, then the cache was not in the other controller. 

 Re: CLI Part2.   when the array is online the prompt changes to GroupName> on the active and remains CLI> on the passive.  Because the passive can't see the disks, it can't read the configuration.    Since the array is down, neither controller can read the configuration database.  The prompt defaults to CLI>   That is also why you got the 'unconfigured' message.  FYI:  Should this or something like it happen again, do NOT re-run setup.  That will wipe the configuration and data. 

 Regards, 

Don

#iworkfordell

 

22 Posts

April 22nd, 2023 21:00

Hi Don,

 

Yes this is running the half filled config. Is there a way to tell which is the active controller? There's only one responsive on serial but I read somewhere that if it only shows "CLI>" I'm not connected to the active. 

When you say use raidtool to check, what do I check for?

 

Thanks again for the response

22 Posts

April 22nd, 2023 23:00

Thanks for the explanation. Can I just clarify when you mean power off, do I have to do a proper shutdown sequence or just turn off all the cables? Because when I type the shutdown command it says it will fail over to the other controller automatically

1 Rookie

 • 

1.5K Posts

April 22nd, 2023 23:00

Hello, 

 You are welcome.  The array isn't up, so the shutdown command isn't needed, and it won't actually power off the array.   Power off the three power supplies then wait 2 mins.  Remember to move the serial cable to other controller before powering it back on and pull out the current active controller about an inch. 

 Regards, 

Don

#iworkfordell

1 Rookie

 • 

1.5K Posts

April 24th, 2023 08:00

Hello, 

  Just following up on your progress. 

 Don 

#iworkfordell

1 Rookie

 • 

1.5K Posts

April 24th, 2023 18:00

Hello, 

  Re: 1  Yes, just push the controller back in.  It will boot up, see the primary and sync the cache automatically.  

  Re: 2  2mins.  It's making sure the power supplies completely drain down.  

  Re: Cache.  The cache is backed up by batteries that last 72 hrs from full charge.  The cache has already been recovered, and the passive CM will get a fresh copy of the cache when it syncs. 

  At the GrpName> show member   Make note of the member's name 

  Then GrpName>member select MEMBERNAME show controllers   This should show that both controllers are online and sync'd.    

  When the array came back up, it booted off the secondary CM,  that controller didn't  have the up to date cache.  It didn't match the RAIDset so the boot process stops to prevent any damage.  Newer versions of firmware automatically look at the passive for the missing cache and will sync the cache that way.  You are running firmware that is many years old.   

 I'm glad it worked out so well.   

 Regards,

Don 

 

#iworkfordell 

 

No Events found!

Top