Highlighted
duanef
1 Nickel

Squanky temperature probe snuffs Powervault

So here's a tale. In a temperature controlled server room, at 22:04:04 the temp probe throws an alert that temp dropped _below_ the minimum. Dubious, and indeed at 22:04:41 or mere centiseconds later, Device returned to normal.

 

http://i44.tinypic.com/29cpbmq.jpg

The thing is, immediately and commensurate with the first low temp alert, this initiates a host system shutdown. Unfortunately it appears that the temp probe coming right does not cancel the forementioned shutdown. And indeed the server proceeds to shut down shall we say less than gracefully. At next appearance there it is, sitting in POST with the attached drive array showing all drives failed and flashing amber.

The joy is mine.

Later forensics will reveal that milliseconds after the temp returns to normal, an event is logged to commemorate the joyous happening.

http://i44.tinypic.com/noua8z.jpg

I reboot to find my happiness is complete. All logical drives on the vault are gone, and OpenManage -> Storage .... Virtual Disks offers two suggestions: blink and unblink. No online. No rebuild.

I blink. I reboot and try Ctrl+M.

While this interface brings back warm memories of EISA configuration, there is nothing on offer that hints of drive recovery or restoration. I follow the wisdom of the ages: I try turning it all off and on again.

Once more into OpenManage. I drill down into the Connector -> Array Disks and against all logic, but trusting to Fortuna, I bring a single drive Online. Green light. A second. Third. And so on. And then I wait, sniffing lavendar and nibbling bonbons on my pillow, while the virtual disk goes through several hours of Initializing.... and comes back.

May your morning be filled with as much fun and excitement.

 

 

0 Kudos
1 Reply
Moderator
Moderator

Re: Squanky temperature probe snuffs Powervault

Duanef,

I am glad to hear that everything has come back online and I am sorry you had to go through that nerve racking process. One thing that you may want to look into is making sure the firmware on the storage device is up to the latest version. The reason I say that is because I have seen a couple times when out of date firmware has caused false errors when it comes to temp errors and fan speed errors. If you let me know what storage device you are working with I can grab those links and send those over to you. I look forward to hearing back from you.

Kenny K.

Download the Dell Quick Resource Locator app today to access PowerEdge support content on your mobile device! (iOS, Android, Windows)