Start a Conversation

This post is more than 5 years old

Solved!

Go to Solution

45122

January 16th, 2015 05:00

R610 highly unstable with stock ESXi 5.5u2 - would custom iso help?

Decided to upgrade our ESXi 5.1 to 5.5 u2 on our R610:s to get the new hotness, and did it the same way I've installed all along - just grabbed the latest ISO from Vmware. This has worked very well with both 4.1 and 5.1; the systems have been rock solid for years. 

5.5 u2 however has severe iSCSI issues, and the machines just hang, drop the iSCSI connections or both, this actually ate one of my VM's. Thank heavens for backups. 

I didn't think about grabbing the custom ISO from here, but does that solve these extremely annoying stability issues? I've tried both using software iSCSI (the way it had to be set up originally with 4.1 to get jumbo frames) and now with the built- in hardware Broadcom NIC's, but the moment you put severe pressure on the storage the iSCSI starts dropping and causing catastrophic issues with the entire cluster. 

In the cleanup process I've applied all the latest firmware patches, BIOS is now at 6.4.0, I believe it was.

Moderator

 • 

8.5K Posts

January 16th, 2015 07:00

Kimmoj,

The issue likely is due to the VMware version. The Dell Custom ESXi version has the drivers specific to the Dell servers, where the VMware version doesn't. So it is far more likely that the Dell version will run smoother and more stable, since the drivers are specifically designed. You can find the 5.5 U2 version here.

Let me know if this helps.

5 Posts

January 19th, 2015 01:00

I'll see how that goes. I suppose it might be an incompatibility with the iSCSI on the SAN but it seems far fetched. I'll give the Dell customized image a whirl before sticking with 5.1 (which has been absolutely rock solid) until I replace the hosts in a year or so.

5 Posts

January 21st, 2015 00:00

Just thought I'd add "for the record" that the Dell custom version seems to be quite stable on the R610. Been running it for a few days now on part of the hosts and been stress testing the systems with storage vmotions and other stuff and so far so good. The build I got from VMware's site was newer though, 2143827 vs 2068190, but better a 5.5 u2 that works than one that's a nightmare. So, lessons learned, always grab the OEM customized ESXi if available. 

Moderator

 • 

8.5K Posts

January 21st, 2015 04:00

Good to hear, glad the nightmare is over. Let us know if you need anything else. We'll be here.

4 Posts

March 10th, 2015 23:00

I'm currently pursuing a similar issue where my two ESXi 5.5u2 hosts are purple screening due to the iSCSI subsystem.  I called into Dell support referencing this article given that I was running the VMware sanctioned install of this version.

I was told that the Dell sanctioned version should be installed due to the customizations specific to the hardware and spent the late evening/early morning replacing both Hypervisors with the Dell install followed by 18 updates via Update manager from VMware.

Since that time and running on the Dell version both of my hosts have purple screened 6 times in the course of 8 hours.

Dell analyzed the data from the screenshot dumps off of the DCUI and came up with this KB article:  http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2089348 indicating that at least in 5.5 there is no known work around.  Given that this is the case my question to VMware is whether or not I should downgrade my hosts Dell's sanctioned 5.1u3 version assuming there's no known stability issues with that version's iSCSI sub-system.

It's worth noting that the issue is not entirely with the hyper-visor as we are also experiencing other iSCSI network related performance issues that seems to send the ESX host into a panic trying to recover from the iSCSI timeouts/aborts.  It just seems that contrary to this forum thread, at least in my experience, the VMware version of 5.5u2 has been more stable given that I didn't start having purple screens until installed Dell's version.

~Pete

5 Posts

March 11th, 2015 00:00

Thanks for posting. Yeah, I spoke way too soon in this thread, my 5.5 stabilized somewhat with the Dell image, as in it ran for about a week and then it was all dead again as opposed to just falling over in minutes. That was the point I went back to 5.1u3 as well as developing an unreasonable and blinding hatred for ESXi 5.5 or above. Ok, well, maybe not, but certainly I'm in no hurry to upgrade past 5.1.

Unfortunately when it happened to me the support contract with VMware had mistakenly been allowed to lapse so I didn't really have the option of exhaustively researching it - going to 5.1u3 bypassed the issue altogether for me and the whole system has been stable ever since.

4 Posts

March 12th, 2015 17:00

In all fairness, I'm not here to knock Dell's ESXi 5.5u2 custom image, or VMware's ESXi 5.5u2.  The issue I've been currently fighting stems from an aged iSCSI switch environment that cannot handle the load between our array and the ESX hosts.

It just so happens that when you're running with a sub-optimal iSCSI switch configuration, both the array and ESX hosts are hard pressed to do their jobs and when the hypervisor's iSCSI sub-system tanks trying to meet the demands it should be no surprise that the hypervisor will become unstable.

Given this scenario, the VMware ESXi5.5u hypervisor  has proven more stable under duress than the Dell ESXi5.5u custom hypervisor.  I'm sure both work just fine in an ideal network environment.

While waiting for our replacement switches I'm still fighting PSODs on the Dell hypervisor and running damage control.  Unfortunately because I had already upgraded the guest HW to v10 I found that I could not revert to Dell's custom 5.1 installation so I re-imaged one of the two hosts in the cluster with Dell's 5.5 base hypervisor image which has proven to be a bit more stable than the 5.5u2 Dell install that it was previously on.

At this point I'm holding off on the hypervisor as the issue since it's more or less a result of the underlying problem with the switches.

Cheers,

~Pete

4 Posts

March 13th, 2015 13:00

Update:  Just received word back from VMware after analyzing the core dumps I submitted.

In my particular configuration we are using QLogic dual HBA iSCSI adapters.  It appears that there is an ongoing problem report being worked by VMware and QLogic.  They've provided an action plan and temporary fix for the QLA4XXX driver that might alleviate the PSOD's I have been seeing.  I'm hoping this temp fix will stabilize things for me a bit more.

~Pete 

5 Posts

March 16th, 2015 00:00

Unfortunately because I had already upgraded the guest HW to v10 I found that I could not revert to Dell's custom 5.1 installation so I re-imaged one of the two hosts in the cluster with Dell's 5.5 base hypervisor image which has proven to be a bit more stable than the 5.5u2 Dell install that it was previously on.

There are several ways to downgrade from 10 to 9, a few officially supported (like using VMware Converter and doing a V2V conversion) and an "ugly hack" that isn't officially supported but reportedly works - bringing the machine down, going in via SSH and editing the .VMX file for the virtual machine and just changing a number from 10 to 9. 

I'm very glad I only upgraded my VM's to version 9 and held off on 10 until I was sure 5.5 was stable. Turns out for me it wasn't, so I'm back on 5.1 and plan to stay there. I'm about to replace my storage, though, so I may do a single-host trial of 5.5 again when using that. 

Not knocking Dell's image either, for me it actually was better than the stock. Unfortunately in my case that just meant it took a little longer to break. My storage right now is based on Nexenta, so that may be a contributing factor somehow, but with 5.1 I see over 10k IOPS out of it and zero issues for the past four or so years, so I still blame VMware. :)

No Events found!

Top