Start a Conversation

This post is more than 5 years old

Solved!

Go to Solution

1336283

June 10th, 2014 06:00

Performance issue / poor performance: VRTX with Fault Tolerant / Dual Shared PERC 8

Hi,

I have been struggling with a storage performance issue for a few months. I found out that the culprit in this performance hit, is the Fault Tolerant Shared PERC 8 in the VRTX. In this configuration you are bound to the write-through caching policy, which is currently performing very bad.

Because of this big performance hit on the storage (caused by the caching policy) I had the following problems:

  • Poor performance of (HyperV) VM's
  • Poor copy performance (copying big files within the same volume 70MB/s max)
  • Unable to make a backup of the VM's using either Veeam B&R 7 or Backup Exec 2014 RTM (backup and VSS are related)
  • VSS time-out errors (0x8000ffff- Catastrophic failure)
  • Checkpoint creation of a VM took about 2 minutes

The release notes of the latest firmware (23.8.12-0061) for the SPERC 8 state:

Firmware supports VRTX running with single SPERC8 controller;
- Write-back cache is enabled for VDs.
- Does not have high availability capability

Firmware supports VRTX running with dual SPERC8 controller;
- High availability capability present (Both SPERC8 needs to be connected to expanders)
- One of the controllers is active and the other is passive
- Write-back cache is disabled for VDs; only option for write cache is Write Through.


To downgrade from a dual controller to a single controller, I removed the second SPERC8 controller, removed the second expander, re-cabled the SAS connections and downgraded the chassis to 1.25. Just removing the controller and expander wasn't enough, because then I still was unable to set "write-back" as caching policy. As soon as I downgraded to 1.25 I was able to set write-back as caching policy.



With the single controller configuration, the storage performance is as it should be. All the earlier mentioned problems are gone:

  • (HyperV) VM's perform excellent
  • copy performance is excellent (copying big files within the same volume around 650MB/s max)
  • Backup actually works (backup and VSS are related)
  • No more VSS time-out errors (0x8000ffff- Catastrophic failure)
  • Checkpoint creation of a VM took about 4 seconds

From this experience I currently strongly advise to not go for the dual SPERC8 configuration and stick with the single configuration. Hopefully a firmware update will solve this performance issue, because at the moment having the dual / fault tolerant SPERC 8 configuration with this very poor storage performance makes no sense at all. Yeah, you're fault tolerant... but no backup and bad performing VM's aren't really a good option in a production environment I think...

Or might there be an essential setting I am missing for the write through setup? (I have tried different settings already...)

Below are the benchmarks made using Atto. The difference is tremendous! Funny fact: while doing the benchmark with the write back policy enabled, the virtual disk was still in its background initialization  process, and with the write through it was not. 

NOTE: Both VD's were configured as RAID6

September 4th, 2014 00:00

Hi,

I think in the past, with older biosversions, you had the option for enabling and disabling the write cache. I don't how it was working because I have never seen it live. But I think there was a reason for disabling it. I read the manul of vrtx and they had a screenshot of this point. I tried to activate it, but with the newer biosversions you don't have the possiblity to do that.

When I was talking to DELL months ago with my problem, they said at first, they do not know if it is possible to solve the problm with just a firmwareupdate because at that time they had the big probem of keeping the write cache synchron.

6 Posts

September 5th, 2014 03:00

Hi,

I should probably share more details.

running ESXi 5.5, tests were performed on Windows 2012, not really relevant information.

When I was talking about Disk Cache, I meant enabling/disabling Cache Disk Policy you can enable on Virtual Disk

2 Posts

September 29th, 2014 13:00

Ttoonka,

Regarding "My results are not that bad as you have published above."

That is because you are running RAID-10 vs. RAID-6, and (apparently) you have also enabled the on-disk caches as well.  You should not ever do this unless you are prepared to lose data in the on-disk cache!!

In any case, those results are still very bad, RAID-10 performance on writes should be no less than 50% of reads -- so your results look quite ugly for RAID 10 on 10KRPM disks!!!

You should try something like IOmeter that will also tell you response times.  If you do this you would see that response times are 100x longer than with write-back cache setting.

IMHO; The idea that VRTX designers believed that FOUR server blades could share a >>single<< LSI RAID chip was silly to begin with.  That chip was meant to go in a single-server application.  It is the same LSI chip Dell uses on a standard PERC.

What's worse is that that also believed that they could run without write cache!!  The team that developed this box seems to know very little about storage.

2 Posts

September 29th, 2014 13:00

Re: "I meant enabling/disabling Cache Disk Policy you can enable on Virtual Disk".

One should NEVER enable the Cache Disk Policy at the VDisk level, this is only used for applications that can safely lose data in the cache in the event of a bus reset, etc.

Whatever performance you were seeing with that turned on is only relevant in non-production environments.

September 30th, 2014 02:00

Hi,

look at this diskussion: http://en.community.dell.com/support-forums/servers/f/906/p/19587459/20644031#20644031

If you don't see the linkg google: Performance issue / poor performance: VRTX with Fault Tolerant / Dual Shared PERC 8

6 Posts

September 30th, 2014 02:00

RAID 10 should minimalize the risk to zero of having disks cache enable and fisk failure.

RAID 10 should be writing simultaneously to two disks. If one disk fails, I still have a data in second disk and RAID 10 will recover from this situation.

For poweroutage, one should always use UPS, and disks itself having small capacitor, which will keep a data in their case for a while.

Am I wrong or what else could happen ?

PS: I do agree that the performance is not the greatest as We all would expect.

 

Thx

T.

 

9 Posts

October 31st, 2014 05:00

I will be watching this thread with interest as I too have just purchased a 2nd PERC card for my VRTX for redundancy. I will do some testing next week.

Comments:-

@TTOONKA - Think about what you are saying about RAID10 with caching. Data goes from the O/S through the h/w and into the cache on the active PERC, if that PERC were to fail with data in the cache it would be lost forever. Doesn't matter if you have RAID10 or RAID1, the data is not on the disk, it still in the cache that failed.

I can total understand the issue and why Dell have decided to switch off write cache, what they need to do is work out a way of the 2nd PERC card being also on-line and getting the same data as the active card constantly. That way if the active one fails, the backup one continues. However I would imagine this will be extremely difficult to do, it would require the 2 card to be in constant communication to know how much of the 1Mb cache has already been written and where its got to...

In the meantime, I will test my performance with RAI5, if its not good enough I will have to look at following the detailed post above and disable the spare PERC in the CMC, only re-enabling it if the primary fails. That way I can have write caching enabled :)

I presume this problem is only with write caching and that read-caching can be on?

6 Posts

October 31st, 2014 09:00

@BLUE407, thanks for jumping into the discussion, but you misunderstood me.

I was talking about havin enabled disk cache (not PER8 controller cache) to improve writing speed if used controller configuration is Fault Tolerance > Active/Passive controller.

Means if one active PERC8 controller dies, simultaneously system switch to passive PERC8 which become Active one, no data lost.

To avoid data lost when we have disk cache enabled I believe is enought to use RAID 10, where writes are going at the same time to two different disks (probability that two disks dies at the same time is close to zero).

This was my thought and the space for a discussion.

In general, Dell needs to work on this topic and mirror controller cache to passive controller if they would allow enable it (which is not the case today with Active/Passive configuration).

 

T.

November 7th, 2014 12:00

Erik,

Please let us know if it works for you.  I just found this today, as well, and I have a customer that is having random disconnect issues with their vSphere VMs running in a dual Shared PERC config.  

Thanks for being the guinea pig!!

           --Mark

3 Posts

November 15th, 2014 11:00

Did you have a change to test this new firmware (which seems, if I am not mistaken, specific to Window server OS) ?

November 25th, 2014 02:00

Hi Sonny,

I appreciate your question and concern, but unfortunately I do not have the answers to your technical questions. I think it is better to direct those questions to dell support and/or create a new question on the forum. I am not a VRTX / SPERC product specialist, I only had some issues with the setup and figured out a lot on my own.

Sorry I can't be of any help regarding your questions.

Regards,

Erik

1 Message

November 25th, 2014 02:00

Hi Erik, Thanks for the post Now that VRTX has dual redundant PERCs with write back caching available I have a few questions 1. I'm assuming that the PERCs operate in an active/passive configuration still. Is this correct? 2. How are the caches on each card kept consistent? Also as the PERCs have WB I assume they will have battery backup. If so 3. How often do the batteries need replacing? 4. Do I need to power off the VRTX chassis to perform this replacement? Many thanks Sonny

26 Posts

December 29th, 2014 12:00

Guys,

I would advice against running this firmware based on my experience this past 6 weeks. My VRTX running a Dev/Test ESXi cluster was plagued with issues after upgrading to this firmware and enabling the 2nd Perc. The VMs and hosts would become unresponsive. Commands issued to the hosts would timeout on the VI Client. vMotions (manual or DRS initiated) would fail to complete. The logs for the hosts would keep reporting about 'Lost connectivity to Datastore' issues. 

After the 4th outage to this environment, which was running solid on firmware 23.8.12 with the 2nd Perc disabled, we think there's issues with the firmware. In short:

  1. Firmware 23.11.16-0076 with both Percs enabled - all sorts of issues with VMware.
  2. Firmware 23.11.16-0076 with 2nd Perc disabled - same as above.
  3. Rollback to firmware 23.8.12 and disable 2nd Perc. - system stable again.

Furthermore, after calling Dell Support this morning, I was told that I should not be running this firmware with both controllers AND writeback enabled. That the firmware that fixes the issues with writeback and dual controllers was due in Q1. And yes, I did tell the tech on the line that is not what my TAM told me and not what the download page for the latest firmware says, but the tech still said that information was not accurate. So I emailed my TAM to get some clarification. But yeah, I advise against putting this in a production environment.

1 Message

April 14th, 2015 21:00

Marvin

Did this firmware update solved your problema? i mean, with this update you were able to set the dual SPERC8 in fault tolerance and obtain the same performance as with one SPERC?

April 15th, 2015 00:00

We have lost complete confidence in this ever going to work and as the VRTXs are in production for quite some time now, we are not going to test this. We have had enough trouble and downtime due to these faulty/buggy firmware(s).

 

So we don't know if the latest firmware will solve the problem and we are unwilling to take the risk atm.

No Events found!

Top