This post is more than 5 years old
388 Posts
0
3995
No alert notification during power outage !
Hi,
We are running a CX4-120 SAN. Due to power failure of our site, the SAN is down (From log file it seems not gracefully) and restart after the power is back.
We don't get any notification from the SAN. Is it correct ? If not, what should be the right notification message ?
Thanks
Message was edited by: TonyJK
Storagesavvy
474 Posts
1
January 2nd, 2014 16:00
The array will send several emails for the power down and you may see the “too many alerts” in addition to several others. However, it’s likely that the too many alerts was during the power up, not the power down. If the SMTP server is unavailable during the power down, then those alerts will not be received because the array will have no way to send them. In a full datacenter power outage, you probably won’t see many alerts because the network switches, routers, servers, etc all go down immediately, while the array is reacting and powering down for a few minutes after the power goes out. There’s simply no way for the emails to make it out.
Vipin VK
812 Posts
0
January 1st, 2014 11:00
I am not sure of the message but you should have got at least power failure alerts at the moment the SPS lost its AC input. Was the email notification working fine ?
TonyJK2
388 Posts
0
January 1st, 2014 15:00
The SMTP server is at remote site (The one at HQ is not used for Email Spoofing). When there is power surge, it appears that the main SMTP server is down at HQ as well.
When the SMTP server is started next morning, we get the following error messages
1) Description The storage system received too many events in this poll cycle to display here. Contact your service provider to determine if any storage system error events occurred that you should address.
2) Description LOCAL DMI LOG: Found a warning during POST: WARNING: Error reading PSA1 resume PROM (0x01C2)
Thanks
Vipin VK
812 Posts
1
January 1st, 2014 18:00
I would suggest a support ticket for spcollect analysis if not done
already. Support will be able to tell you what went wrong with your system
- may be some bugcheck.
If SMTP server was down, that should be the reason why you were not
receiving the alerts.
Storagesavvy
474 Posts
0
January 2nd, 2014 09:00
What did you see in the log file that makes you think it was not graceful?
Storagesavvy
474 Posts
1
January 2nd, 2014 15:00
Those are normal entries for a power down when the array detects a power outage. All disks, LCCs, etc that are NOT in bus 0, enclosure 0 lose power immediately when the site power is lost. Then the array vaults cache to the vault disks in bus0,encl0,slots0-4. After the vault process is complete the array powers off.
In short: None of those entries indicate anything abnormal, assuming that the rack/datacenter lost power.
TonyJK2
388 Posts
0
January 2nd, 2014 15:00
We get the following error messages:
21:29:50 The AC Fail state has changed for this power supply and no SPS test is in progress. This indicates a problem with the AC input to this power supply.
21:29:56 Battery Online
21:30:09 K10_DGSSP FlushRegistry 76 00 01 00
21:30:09 K10_DGSSP FlushSystemDisk 76 00 01 00
21:30:12 VSC Shutdown/Removed
21:30:13 SPS Removed
21:31:26 Standby Power Supply (Enclosure SPE SPS A) is faulted. See alerts for details.
21:31:26 Power Supply (Enclosure SPE Power A0) is faulted. See alerts for details.
21:31:26 Power Supply (Enclosure SPE Power B0) is faulted. See alerts for details.
21:31:26 Disk Array Enclosure (Bus 1 Enclosure 1) is faulted. Servers may have lost access to disk drives in this enclosure. See alerts for details.
21:31:26 Disk Array Enclosure (Bus 0 Enclosure 0) is faulted. Servers may have lost access to disk drives in this enclosure. See alerts for details.
21:31:26 Disk Processor Enclosure (Enclosure SPE) is faulted. Servers may have lost access to disk drives in this storage system. See alerts for details.
Then we get a lot of
CRU Powered Down
Unit Shutdown
Trespass Failed LUN ....
21:33:49 Fault - Cache Disabling
21:33:53 The AC Fail state has changed for this power supply and no SPS test is in progress. This indicates a problem with the AC input to this power supply.
Then nothing in the log file until power is resumed.
Thanks
TonyJK2
388 Posts
0
January 2nd, 2014 16:00
Dear Rich,
Many thanks for your confirmation that SPS is working properly.
In this way, no notification alert should be the configuration of SMTP Server.
My manager would like to know if the SMTP Server is working properly, should we get alerts other than
"Description The storage system received too many events in this poll cycle to display here. Contact your service provider to determine if any storage system error events occurred that you should address" ?
OR we usually get this error message when power outrage (Because there are too many Error Messages) ?
Thanks again