Start a Conversation

Unsolved

This post is more than 5 years old

P

33124

August 6th, 2015 04:00

Dell VRTX CMC failed but did not failover properly

Hi,

We had an issue this week where we were not able to login to the web gui for our CMC so we rebooted the entire chassis (a complete shutdown). When we fired it up again we were able to login to the CMC but it had two issues:

1.) The CMC had lost redundancy and the link light on one of the redundant links was not working

2.) We could not power on any of the blade servers either using the webgui or by manually powering them on. The webgui reported that idrac was not working and indeed it was not showing any information on the servers.

I tried reseating the blades and also using the racadm command to virtually reseat them to no avail. I could still not power them on. There was no information in any of the logs.

After speaking to dell support (who were really great it has to be said), they advised me to swap over the cmc cards in the chassis which I did and then we were able to power up the blades but the cmc was still reporting that the cmc was not redundant. Dell sent a new cmc out and it is now fitted and the issue has gone away.

What is concerning is that this server is in test at the moment and about to be deployed in to a colocation facility. How is it that this server (being a flagship of the dell server product line) with the CMC redundancy built in, stopped the powering up of the blade servers if the CMC card failed? To me it sounds like there is something in the way the software or the hardware has been built that is not redundant. If CMC one fails then we should be able to power on the blades without having to swap the cmc cards over. Is IDRAC hardwired into CMC1 or something?

I would like to get to the bottom of this issue as it is concerning. At least, I know now if it does it again that we need to swap the cards over but a really pain to have to go to a colo facility and pull out all the kit to do it.

I would be very grateful for any help, advice or suggestions that could help make sure this issue never happens again and I would expect that all dell vrtx customers would benefit from this issue being resolved if indeed it is not a one off.

thanks,

Paul

Moderator

 • 

8.5K Posts

August 6th, 2015 09:00

Paulow1978,

It sounds like the failing CMC didn't fully release control to the redundant CMC. When you removed the failed CMC, I believe that is when the failover actually completed. Do you know if the original CMC's matched in firmware, if there is a large gap in CMC's FW it could have led to timeouts and things like this. What I suggest is verifying that both the CMC's are up to date and matching in firmware. The current version is 2.0.4., which you can find here. After that then you can test the CMC's to verify that the CMC is failing over correctly.

Let me know if this helps. 

13 Posts

August 6th, 2015 09:00

Hi,

Thanks for replying. yes the cmc's were on the same firmware. I cant remember which one now as we have already upgraded to 2.04.

You would think that a reboot of the entire system would release any hold the cmc had but maybe not. It sounds to me like there is a flaw in the system somewhere. Hopefully this is resolved in the latest firmware.

thanks,

No Events found!

Top