Unsolved
22 Posts
0
1069
PS6500 "not configured" after power outage
We had a short 30 mins outage today, but SAN not responding to any connections except via serial.
Only one of the controllers is responding, am able to login with grpadmin but seems like it lost all other config data?
"It appears that the storage array has not been configured. Please run setup before executing management commands"
justj0sh
22 Posts
0
April 20th, 2023 04:00
Some commands from another thread
CLI> support exec "uname -a"
You are running a support command, which is normally restricted to PS Series Tec
hnical Support personnel. Do not use a support command without instruction from
Technical Support.
NetBSD 1.6.2 NetBSD 1.6.2 (EQL.PSS) #0: Wed Apr 16 11:33:43 EDT 2014 build@m64:/buildarea/V6.0.10__Wed_Apr_16_2014_11_32_34_EDT/bin/destdir.sbmips.release/EQL.PSS xlrmips
CLI> support exec "raidtool"
You are running a support command, which is normally restricted to PS Series Tec
hnical Support personnel. Do not use a support command without instruction from
Technical Support.
Driver Status: *Admin Intervention Requested*
RAID LUN 0 Ok.
raid status unrecoverable.
12 Drives (37,8,29,13,45,12,4,20,1,5,9,17)
RAID 6 (64KB sectPerSU)
Capacity 29,576,400,076,800 bytes
RAID LUN 1 Ok.
raid status unrecoverable.
10 Drives (44,25,21,16,24,28,32,36,40,41)
RAID 6 (64KB sectPerSU)
Capacity 23,661,120,061,440 bytes
Available Drives List: 0,33
CLI> support exec "diskview -j"
You are running a support command, which is normally restricted to PS Series Tec
hnical Support personnel. Do not use a support command without instruction from
Technical Support.
Enc/Drive State Write Read Power Drive Bad ForceWrite Reset Read Scan Max Max
Retrys Retrys Cycles Timeouts Blocks Retrys Fail Timeout Errors Cominits HrstMsecs
______________________________________________________________________________________________________________________
0/ 0 Online 0 0 0 0 0 0 0 0 0 0 0
0/ 1 Online 0 0 0 0 0 0 0 0 0 0 0
0/ 2 Slot Empty 0 0 0 0 0 0 0 0 0 0 0
0/ 3 Slot Empty 0 0 0 0 0 0 0 0 0 0 0
0/ 4 Online 0 0 0 0 0 0 0 0 0 0 0
0/ 5 Online 0 0 0 0 0 0 0 0 0 0 0
0/ 6 Slot Empty 0 0 0 0 0 0 0 0 0 0 0
0/ 7 Slot Empty 0 0 0 0 0 0 0 0 0 0 0
0/ 8 Scanning 0 0 0 0 0 0 0 0 0 0 0
0/ 9 Online 0 0 0 0 0 0 0 0 0 0 0
0/10 Slot Empty 0 0 0 0 0 0 0 0 0 0 0
0/11 Slot Empty 0 0 0 0 0 0 0 0 0 0 0
0/12 Online 0 0 0 0 0 0 0 0 0 0 0
0/13 Online 0 0 0 0 0 0 0 0 0 0 0
0/14 Slot Empty 0 0 0 0 0 0 0 0 0 0 0
0/15 Slot Empty 0 0 0 0 0 0 0 0 0 0 0
0/16 Online 0 0 0 0 0 0 0 0 0 0 0
0/17 Online 0 0 0 0 0 0 0 0 0 0 0
0/18 Slot Empty 0 0 0 0 0 0 0 0 0 0 0
0/19 Slot Empty 0 0 0 0 0 0 0 0 0 0 0
0/20 Online 0 0 0 0 0 0 0 0 0 0 0
0/21 Online 0 0 0 0 0 0 0 0 0 0 0
0/22 Slot Empty 0 0 0 0 0 0 0 0 0 0 0
0/23 Slot Empty 0 0 0 0 0 0 0 0 0 0 0
0/24 Online 0 0 0 0 0 0 0 0 0 0 0
0/25 Online 0 0 0 0 0 0 0 0 0 0 0
0/26 Slot Empty 0 0 0 0 0 0 0 0 0 0 0
0/27 Slot Empty 0 0 0 0 0 0 0 0 0 0 0
0/28 Online 0 0 0 0 0 0 0 0 0 0 0
0/29 Online 0 0 0 0 0 0 0 0 0 0 0
0/30 Slot Empty 0 0 0 0 0 0 0 0 0 0 0
0/31 Slot Empty 0 0 0 0 0 0 0 0 0 0 0
0/32 Online 0 0 0 0 0 0 0 0 0 0 0
0/33 Online 0 0 0 0 0 0 0 0 0 0 0
0/34 Slot Empty 0 0 0 0 0 0 0 0 0 0 0
0/35 Slot Empty 0 0 0 0 0 0 0 0 0 0 0
0/36 Online 0 0 0 0 0 0 0 0 0 0 0
0/37 Online 0 0 0 0 0 0 0 0 0 0 0
0/38 Slot Empty 0 0 0 0 0 0 0 0 0 0 0
0/39 Slot Empty 0 0 0 0 0 0 0 0 0 0 0
0/40 Online 0 0 0 0 0 0 0 0 0 0 0
0/41 Online 0 0 0 0 0 0 0 0 0 0 0
0/42 Slot Empty 0 0 0 0 0 0 0 0 0 0 0
0/43 Slot Empty 0 0 0 0 0 0 0 0 0 0 0
0/44 Online 0 0 0 0 0 0 0 0 0 0 0
0/45 Online 0 0 0 0 0 0 0 0 0 0 0
0/46 Slot Empty 0 0 0 0 0 0 0 0 0 0 0
0/47 Slot Empty 0 0 0 0 0 0 0 0 0 0 0
CLI> support exec "raidview"
You are running a support command, which is normally restricted to PS Series Tec
hnical Support personnel. Do not use a support command without instruction from
Technical Support.
Driver Status: *Admin Intervention Requested*
RAID LUN 0 Ok.
raid status unrecoverable.
12 Drives (37,8,29,13,45,12,4,20,1,5,9,17)
RAID 6 (64KB sectPerSU)
Capacity 29,576,400,076,800 bytes
RAID LUN 1 Ok.
raid status unrecoverable.
10 Drives (44,25,21,16,24,28,32,36,40,41)
RAID 6 (64KB sectPerSU)
Capacity 23,661,120,061,440 bytes
Available Drives List: 0,33
justj0sh
22 Posts
0
April 20th, 2023 04:00
Trying to restart gracefully but afraid of proceeding further as the other controller seems to be dead and unsure what will happen if it tries to failover to that one
CLI> restart
After you enter the restart command, the active control module will fail over.
To continue to use the serial connection when the array restarts, connect the
serial cable to the new active control module.
Do you really want to restart the system? (yes/no) [no]:
justj0sh
22 Posts
0
April 20th, 2023 04:00
Output from "show". Seems like one of the controllers is dead? Is it safe to run "reboot/shutdown"?
_____________________________ Member Information ______________________________
Model: 70-0300 Serial-Number:
Disks: 24 Spares: 2
Controllers: 1 CacheMode: write-back
Service Tag:
_______________________________________________________________________________
_________________________________ Controllers _________________________________
ID Model CM Rev. SW Rev. Statu
-- ---------- ---------- --------------------------------------------- -----
0 70-0300 A05 Storage Array Firmware V6.0.10 (R390548) ok
_______________________________ Ethernet Ports ________________________________
Name ifType ifSpeed Mtu Ipaddress Status Errors
---------- --------------- ---------- ----- --------------- ------- -------
eth0 ethernet-csmacd 4294967295 9000
_____________________________ Temperature Sensors _____________________________
Name Value Normal Range Status
----------------------------------- --------- --------------- ---------------
Control module 0 processor 89 10-75 critical
Control module 1 processor 0 10-75 unknown
Control module 0 SAS Controller 89 10-110 normal
Control module 1 SAS Controller 0 10-110 unknown
Control module 0 Battery 40 10-45 normal
Control module 1 Battery 0 10-45 unknown
____________________________________ Fans _____________________________________
Name Speed Normal Range Status
------------------------------ --------- --------------- ---------------
Power Cooling Module 0 Fan 0 4680 2100-5350 normal
Power Cooling Module 0 Fan 1 4770 2100-5350 normal
Power Cooling Module 1 Fan 0 4620 2100-5350 normal
Power Cooling Module 1 Fan 1 4770 2100-5350 normal
Power Cooling Module 2 Fan 0 4680 2100-5350 normal
Power Cooling Module 2 Fan 1 4620 2100-5350 normal
_______________________________ Power Supplies ________________________________
Name Status FanStatus
---------------------------------------- -------- ---------------
Power Cooling Module 0 on not-applicable
Power Cooling Module 1 on not-applicable
Power Cooling Module 2 on not-applicable
DellEMCSupport
Moderator
Moderator
•
631 Posts
0
April 20th, 2023 12:00
Justj0sh,
I wouldn't recommend shutting it down. Would you do me a favor and confirm if the data is stored in another location or on this system, I ask to confirm if we can just wipe it out and refill. What I would suggest is that you call into support, as it will take direct access to the device to have a chance at recovery, as another issue is that you are on the V6 fw, which had an urgent replacement issued due to instability.
justj0sh
22 Posts
0
April 20th, 2023 12:00
We have backups but we just looked and we are missing some crucial changes in the backup versions so ideally we want the data that's resting on this EQL.
Is there anything else I can run on my end? The data should be saved on the disks not the controller no? Can I just force a reload of the config into the controller?
DellEMCSupport
Moderator
Moderator
•
631 Posts
0
April 20th, 2023 13:00
I wouldnt recommend doing anything at this point, until support looks at it directly, as anything done risks the data.
justj0sh
22 Posts
0
April 20th, 2023 15:00
We don't have a support contract on this machine and we are also on the way out of the EQL ecosystem. What odds do you put on support being able to restore the data? I'm not entirely sure we can get approval to commit for a 50% chance which means we'll probably have to accept the loss.
Assuming the default scenario is to write off the device, do you have any suggestions that might work out and increase the odds from 0% to 10-20%?
DellEMCSupport
Moderator
Moderator
•
631 Posts
0
April 20th, 2023 23:00
Honestly, it is very difficult to answer your question, what you have been advised is the only thing to do in order not to permanently lose the data
dwilliam62
1 Rookie
1 Rookie
•
1.5K Posts
0
April 22nd, 2023 20:00
Hello,
Just to confirm this is a 24 drive configuration correct?
The RAIDsets are in a lost cache condition.
Assuming it is a 24 drive configuration, AND you have backups there are two things you can try.
However, you may have already lost some data. You can run a command to ignore the missing cache, and possibly the array will boot up. It is possible that the missing data is critical to the array configuration. Which would leave the array offline
Option 1: Safe and not difficult to do. Power down the array. Wait 2 mins. Pull the active controller out about 1/2 or so. Connect the serial port to the other controller. Power on the array.. *IF* the missing cache is in that controller it will flush it to disk and the array should boot up. If you are in the same condition you are now. Use CLI>support exec raidtool to check. Then you can try option 2
Option 2: Risky but often works. There is a command to clear the cache. It's documented in the CLI guide. At the CLI> prompt type clearlostdata and hit enter. If it worked well in a couple minutes the array should be back online. Again there is a risk that this could make things worse. A rebuild of the internal DB would likely be needed. Depending on where you are located, you might be able to open a one time support call for a fee. Otherwise you would have to reset the array and restore from backup.
If it's NOT a 24 drive configuration, then DO NOT RUN clearlostdata
If the array does come back, you should run a filesystem check on all your volumes. If running VMs check the virtual disks inside the VM OS.
Regards,
Don
#iworkfordell
dwilliam62
1 Rookie
1 Rookie
•
1.5K Posts
0
April 22nd, 2023 21:00
Hello,
Re: active CM. The one you are currently connected to is the active. If it was not then you would not see any of the disks or raidset.
re: raidtool. Look for the same messages as before "admin interention required" If they are still there, then the cache was not in the other controller.
Re: CLI Part2. when the array is online the prompt changes to GroupName> on the active and remains CLI> on the passive. Because the passive can't see the disks, it can't read the configuration. Since the array is down, neither controller can read the configuration database. The prompt defaults to CLI> That is also why you got the 'unconfigured' message. FYI: Should this or something like it happen again, do NOT re-run setup. That will wipe the configuration and data.
Regards,
Don
#iworkfordell
justj0sh
22 Posts
0
April 22nd, 2023 21:00
Hi Don,
Yes this is running the half filled config. Is there a way to tell which is the active controller? There's only one responsive on serial but I read somewhere that if it only shows "CLI>" I'm not connected to the active.
When you say use raidtool to check, what do I check for?
Thanks again for the response
justj0sh
22 Posts
0
April 22nd, 2023 23:00
Thanks for the explanation. Can I just clarify when you mean power off, do I have to do a proper shutdown sequence or just turn off all the cables? Because when I type the shutdown command it says it will fail over to the other controller automatically
dwilliam62
1 Rookie
1 Rookie
•
1.5K Posts
0
April 22nd, 2023 23:00
Hello,
You are welcome. The array isn't up, so the shutdown command isn't needed, and it won't actually power off the array. Power off the three power supplies then wait 2 mins. Remember to move the serial cable to other controller before powering it back on and pull out the current active controller about an inch.
Regards,
Don
#iworkfordell
dwilliam62
1 Rookie
1 Rookie
•
1.5K Posts
0
April 24th, 2023 08:00
Hello,
Just following up on your progress.
Don
#iworkfordell
dwilliam62
1 Rookie
1 Rookie
•
1.5K Posts
0
April 24th, 2023 18:00
Hello,
Re: 1 Yes, just push the controller back in. It will boot up, see the primary and sync the cache automatically.
Re: 2 2mins. It's making sure the power supplies completely drain down.
Re: Cache. The cache is backed up by batteries that last 72 hrs from full charge. The cache has already been recovered, and the passive CM will get a fresh copy of the cache when it syncs.
At the GrpName> show member Make note of the member's name
Then GrpName>member select MEMBERNAME show controllers This should show that both controllers are online and sync'd.
When the array came back up, it booted off the secondary CM, that controller didn't have the up to date cache. It didn't match the RAIDset so the boot process stops to prevent any damage. Newer versions of firmware automatically look at the passive for the missing cache and will sync the cache that way. You are running firmware that is many years old.
I'm glad it worked out so well.
Regards,
Don
#iworkfordell