About a week ago all the VMs in my lab became inaccessible. I logged into vSphere client and noticed all the VMs were orphaned because the hosts couldn't see the datastores anymore. Logging into the CMC I see the following errors:
Under Storage>Controllers it's now empty.
Finally, on the chassis health page it still detects the SPERCs are installed, but it shows they are not powered on.
Initially I thought maybe the SPERC had gone bad, so I swapped it out with a known working one but the results were the same. Also tried swapping out the backplane expander in case it was bad. I found another support thread where someone had a similar issue and they resolved it by swapping/reseating the CMC and SPERC cards. I've tried reseating everything multiple times, all firmware is up to date, swapped out main board battery, factory reset the CMC. It's like it knows something is plugged into those slots, but just can't power them on for some reason. The only thing left I can really think of is the the main board is bad, but I would assume other things would also be failing. Everything functions properly other than shared storage. I've got all the blades and disks pulled out right now as you can see in the above screenshot, but if I reinsert the blades they all power up fine and I can interact with them. I'm grasping at straws here. Anyone ever seen something like this?
Sorry for the trouble you are having. What is the chassis management controller FW version? CMC FW 1.30 had some errors related to PERC cpntroller cards and they were fixed in later updates. I see CMC 3.40 as the latest version https://dell.to/2O60hUS
Can you also check PERC FW here https://dell.to/3r7hgER
I am not sure if this is the issue you mentioned, but can you also try the solution suggestion here https://www.dell.com/community/PowerEdge-Hardware-General/VRTX-Shared-PERC-8-STOR10-and-STOR9-errors...
Considering what you said and if the above things don't work, I would also be suspicious of the motherboard.
Also, the possibility that there is flea power left on the chassis, can you try to drain the flea power. After power off the chassis disconnect the power cable. Press and hold down power button for 15 seconds.
Let us know how it goes!
Thanks for the suggestions. I tried draining the flea power just now, but unfortunately it didn't resolve the issue. When the issue first appeared I was on CMC version 3.20, but as part of troubleshooting this issue I did upgrade to 3.40.
That link you posted is the one I mentioned above that I found had a similar issue, but reseating the CMC and SPERC didn't resolve it for me.
I have also been suspicious of the motherboard, but it seemed strange to me that everything else would still function properly. My assumption was if the motherboard was bad then I would be seeing other problems or at least other alerts in the logs. Maybe that's not a correct assumption though.
My first thought would also be the SPERCs, but you said you already troubleshooted them, right? The storage controller card has some LED indicators. If the power indicator blinks irregularly or the attention indicator blinks amber, it indicates a fault condition. Do you see any of this? What lights & colors do you see?
You can also try to reseat the integrated storage controller cards, the SAS cables, and the storage controller battery.
Please, let us know.
Actually, no lights at all on the controller cards themselves. The error says it can't power on, so I guess that makes sense? Or should the indicator lights still be working even if the controller isn't getting power? This was a single controller VRTX but I had been planning to convert to a dual controller, so I had a new unused card and backplane expander laying around that I hadn't installed yet. Originally I thought maybe the controller was bad, but I changed it with the new parts (card, expander, cables) and same behavior. I guess I'll bite the bullet and try replacing the main board. Not sure what else to try at this point.