M1000e: "CMC redundancy is lost" error

We have many Dell M1000e blade chassis at multiple locations which experience CMC fail overs as evidenced by this error: CMC redundancy is lost. We have Brocade M5424s switches on C1 and C2 and Cisco Cisco Catalyst WS-CBS3130X-S switches on A1, A2, B1, and B2 and redundant CMCs in the chassis. The CMCs are on firmware version 3.21 and there are a mix of all models of Dell blades in the chassis. We use 3PAR storage. These fail overs seemingly happen at random and can cause storage or networking to go offline at the time of the fail over. Has anyone else experienced this? We've been working with support and making little progress. Thanks!

Responses(6)

fdxpilot

143 Posts

0

May 11th, 2012 11:00

We are seeing the exact same thing. We are on firmware 3.10 but all other symptoms are exactly the same. We are running KR pass-through with Brocade BR1741m-k mezzanine CNA cards to Brocade VDX switches. Any thoughts on this?

Agabbert,

What type of storage do you use? We have several models of 3PAR arrays.

Agabbert

6 Posts

0

May 11th, 2012 11:00

We are seeing the exact same thing. We are on firmware 3.10 but all other symptoms are exactly the same. We are running KR pass-through with Brocade BR1741m-k mezzanine CNA cards to Brocade VDX switches. Any thoughts on this?

Agabbert

6 Posts

0

May 11th, 2012 12:00

We are using NetApp vSeries arrays

fdxpilot

143 Posts

0

May 13th, 2012 22:00

Do you have a case open with Dell for this issue? We are a large Dell customer and have had a case open with them for months and they state they have not seen this issue anywhere else (essentially blaming our environment without being able to find a specific cause). If you have a case open or could open one and provide the case number that would help us in further escellating this issue with Dell. Feel free to PM if you are not comfortable posting publicly. I can also share our case number if you would like that. Thanks!

Agabbert

6 Posts

0

May 15th, 2012 08:00

Just another thing to compare notes. What mezzanine cards are you using in fabric B and C? We are using Brocade BR1741m-k CNA in both fabrics.

Agabbert

6 Posts

0

May 15th, 2012 08:00

Ill give a quick update. We did open a case and they really just looked at vCenter logs and CMC logs and didnt come up with a lot. I had this happen again over the weekend and noticed it was only my esxi 4.1 hosts that lost the storage/network my esxi 5 hosts were fine. I forced a failover to test and got the same results. I upgraded another cluster to vshpere 5 on blades that were in one of the chassis and tested again and they did NOT loose network or storage. It may have be a drive issue but I was able to repeat the process and I am getting no storage/network loss on any version 5 hosts in a CMC failover/heartbeat failure. Support did tell me that CMC versions 3.x had a problem false positive heartbeat failures. Its not a root cause but does trigger the issue we have been seeing. I upgraded 2 of my chassis to CMC version 4.0 to see if we get anymore false positives even though I am upgraded to esxi v5 across the board now

View All

No Events found!

PowerEdge Hardware General

M1000e: "CMC redundancy is lost" error