Highlighted

Performance issue / poor performance: VRTX with Fault Tolerant / Dual Shared PERC 8

Jump to solution

Hi,

I have been struggling with a storage performance issue for a few months. I found out that the culprit in this performance hit, is the Fault Tolerant Shared PERC 8 in the VRTX. In this configuration you are bound to the write-through caching policy, which is currently performing very bad.

Because of this big performance hit on the storage (caused by the caching policy) I had the following problems:

  • Poor performance of (HyperV) VM's
  • Poor copy performance (copying big files within the same volume 70MB/s max)
  • Unable to make a backup of the VM's using either Veeam B&R 7 or Backup Exec 2014 RTM (backup and VSS are related)
  • VSS time-out errors (0x8000ffff- Catastrophic failure)
  • Checkpoint creation of a VM took about 2 minutes

The release notes of the latest firmware (23.8.12-0061) for the SPERC 8 state:

Firmware supports VRTX running with single SPERC8 controller;
- Write-back cache is enabled for VDs.
- Does not have high availability capability

Firmware supports VRTX running with dual SPERC8 controller;
- High availability capability present (Both SPERC8 needs to be connected to expanders)
- One of the controllers is active and the other is passive
- Write-back cache is disabled for VDs; only option for write cache is Write Through.


To downgrade from a dual controller to a single controller, I removed the second SPERC8 controller, removed the second expander, re-cabled the SAS connections and downgraded the chassis to 1.25. Just removing the controller and expander wasn't enough, because then I still was unable to set "write-back" as caching policy. As soon as I downgraded to 1.25 I was able to set write-back as caching policy.

With the single controller configuration, the storage performance is as it should be. All the earlier mentioned problems are gone:

  • (HyperV) VM's perform excellent
  • copy performance is excellent (copying big files within the same volume around 650MB/s max)
  • Backup actually works (backup and VSS are related)
  • No more VSS time-out errors (0x8000ffff- Catastrophic failure)
  • Checkpoint creation of a VM took about 4 seconds

From this experience I currently strongly advise to not go for the dual SPERC8 configuration and stick with the single configuration. Hopefully a firmware update will solve this performance issue, because at the moment having the dual / fault tolerant SPERC 8 configuration with this very poor storage performance makes no sense at all. Yeah, you're fault tolerant... but no backup and bad performing VM's aren't really a good option in a production environment I think...

Or might there be an essential setting I am missing for the write through setup? (I have tried different settings already...)

Below are the benchmarks made using Atto. The difference is tremendous! Funny fact: while doing the benchmark with the write back policy enabled, the virtual disk was still in its background initialization  process, and with the write through it was not. 

NOTE: Both VD's were configured as RAID6

51 Replies
Moderator
Moderator

RE: write-through caching policy Fault Tolerant Shared PERC 8 big performance hit / very poor performance

Jump to solution

Hello Erik

Thanks for the great write-up!

Here are a few things to consider with this:

#1 Write cache has a HUGE performance impact on RAID arrays that use parity, HUGE difference. You should not run a parity array without write caching. Your speeds of 70MB/s on a RAID 6 without write caching are extremely good.

#2 Write caching is disabled when running dual PERCs because the functionality is not present to match the write cache. Our storage appliances that run dual PERC failover configurations have the capability to match the cache. The passive PERC runs a mirror image of the active PERCs cache so that it can take over without data loss in the event of controller failure. The VRTX is not a full-featured storage appliance.

The use of dual PERCs on a VRTX will be dependent on the needs of the individual customer. I think it is great that you made this write-up to point out that write caching is not available in this configuration. Someone running a RAID 10 or other non-parity array will not suffer such a HUGE performance impact due to write caching being disabled, so some configurations run very well with dual PERCs.

Thanks

Daniel Mysinger
Dell EMC, Enterprise Engineer

Get support on Twitter @DellCaresPRO

RE: write-through caching policy Fault Tolerant Shared PERC 8 big performance hit / very poor performance

Jump to solution

Hi Daniel.

Thank you for the clarification! Which led to my following question(s):

Q: In point #2 you say "because the functionality is not present to match the write cache". Will this functionality ever be present/available, or is this impossible seen the way caching works?

I will test and benchmark a RAID10 setup somewhere this week. The big downside with RAID10 is, that we lose a lot of storage capacity, meaning we have 21TB in RAID10 versus 36TB in  RAID6 (that is a 15TB loss). Note: Our VRTX is completely filled with 3.64TB disks

If RAID6 is not the best option (or actual the worst) to use with the dual SPERC8 controller, it would be very nice if Dell made a mention / recommendation / best practice for the dual SPERC8 setup. Now I have been struggling with this issue for months, time that could have been saved when this was written down somewhere. (I realize that this dual SPERC8 setup is pretty new still, so giving new customers/users a heads up on this would be very convenient).

So for anyone that comes across this post:
* If you have a VRTX with a dual SPERC8 and want to keep the fault tolerance, then do not use RAID6 for the storage, go for RAID10 instead (take the storage capacity loss into account!).
* If you think the fault tolerance with SPERC8 is not that important and you want to use RAID6, downgrade the VRTX to be a single controller SPERC8 config and use write-back caching policy..

Q: Currently I have found a way to downgrade to a single SPERC8 controller. Is this the best way or do you know of an easier way to downgrade?

Regards,

Erik

0 Kudos
Moderator
Moderator

RE: write-through caching policy Fault Tolerant Shared PERC 8 big performance hit / very poor performance

Jump to solution

Q: In point #2 you say "because the functionality is not present to match the write cache". Will this functionality ever be present/available, or is this impossible seen the way caching works?

In short, I don't know. I doubt this type of functionality could be added with a firmware update, but I'm not a developer so I'm not sure of the engineering barriers to make it work. I'm not aware of any plans, and I'm not sure if it is possible.

Q: Currently I have found a way to downgrade to a single SPERC8 controller. Is this the best way or do you know of an easier way to downgrade?

You should be able to run the latest firmware with a single PERC8 shared and have write-caching enabled. Your method works, but I'm curious if back-flashing the firmware on the PERC is required. Some changes made with firmware updates are to the default configuration file. If you don't reset the PERC to defaults then the changes will not be implemented. I would suggest trying to reset the PERC to defaults prior to back-flashing to see if that will allow the write-cache option to become available. You can do this in the CMC: Storage>Controllers>Troubleshooting>Actions drop-down.

Thanks

Daniel Mysinger
Dell EMC, Enterprise Engineer

Get support on Twitter @DellCaresPRO

0 Kudos
ThiPham
1 Copper

RE: write-through caching policy Fault Tolerant Shared PERC 8 big performance hit / very poor performance

Jump to solution

I face exactly the same case.
New Dell VRTX + 3 M620 run Vsphere ESXi 5.5
VD as RAID-5 in dual SPERC8 controllers. Write Policy is Write Through (only)
I made 2 images of the  same VM, one in local HDD (of M620) and one in VD storage.

Write Performance of VM in VD storage is very low.



Now backup or cloning VMs are very slow

Do you have the test result with RAID-10 yet? 

should i contact local Dell support ? I dont sure they have the answer. I feel so bad now

Thanks

0 Kudos

RE: write-through caching policy Fault Tolerant Shared PERC 8 big performance hit / very poor performance

Jump to solution

Hi Daniel (and ThiPham),

In your first reply you said the following: "Someone running a RAID 10 or other non-parity array will not suffer such a HUGE performance impact due to write caching being disabled, so some configurations run very well with dual PERCs."

Note: by my understanding "write through" is not equal to write caching being disable, but it is just another (much slower) way of write caching.


Of course I wanted to see/test your claim and I hate to say it, but the results are just as bad (according to me and the expectations sold to us by Dell Sales Department). So I am really wondering what kind of configuration actually is suitable for a dual PERC setup. Currently it is a very bad choice for any virtual environment, as that is a very disk intensive setup.

The results:

Taking all the results into account, I am rephrasing my advice:
If you are planning to run any virtual environment on a VRTX, do not go for the dual SPERC8 setup, stick with the Single SPERC8 instead. The performance hit caused by the currently mandatory caching policy (Write Through), is way too big to smoothly run a virtual environment. No matter which RAID config you choose.

@ThiPham
To have a real performance gain I would advice you to downgrade your VRTX to a Single PERC.

I am still figuring out the best way to downgrade from dual to single. The procedure described in my first post does work, but I am wondering if Daniel is right. Meaning that I only have to remove the controller + expander, re-cable and then reset the single controller. And that I do not need to downgrade the firmware.

And yes, please do contact Dell support! I have done this as well. I think the more people "complain" about this, the more serious they will take it.

0 Kudos
Moderator
Moderator

RE: write-through caching policy Fault Tolerant Shared PERC 8 big performance hit / very poor performance

Jump to solution

Erik

Note: by my understanding "write through" is not equal to write caching being disable, but it is just another (much slower) way of write caching.

That is not correct. Write through and write caching being disabled are the same. The controller will always run data through the cache, but with caching disabled or write through, which are the same thing, it will not queue anything in cache. It basically has a queue depth of 0 with no cache.

And yes, please do contact Dell support! I have done this as well. I think the more people "complain" about this, the more serious they will take it.

There is not an issue with the hardware. This is just an issue of understanding how pivotal of a role cache plays in RAID. Your write speeds with cache enabled are almost 2500 MB/sec. Your HDD's are not capable of writing that fast. Cache allows you to achieve speeds that the drives are not capable of. All you are showing in the above tests is how much effect cache has on RAID performance.

I have run your benchmark on an R720XD with an H710P controller to show you that this is not a hardware or firmware issue. This is simply how RAID works. My test results are not as good as yours, so I suspect you are running better HDDs than me. I have 7200 RPM SATA drives that I am testing with. This is my result with all caching disabled on a different controller in a different system on a 4 drive RAID 10:


Daniel Mysinger
Dell EMC, Enterprise Engineer

Get support on Twitter @DellCaresPRO

0 Kudos
ThiPham
1 Copper

RE: write-through caching policy Fault Tolerant Shared PERC 8 big performance hit / very poor performance

Jump to solution

Taking all the results into account, I am rephrasing my advice:
If you are planning to run any virtual environment on a VRTX, do not go for the dual SPERC8 setup, stick with the Single SPERC8 instead. The performance hit caused by the currently mandatory caching policy (Write Through), is way too big to smoothly run a virtual environment. No matter which RAID config you choose.

@ThiPham
To have a real performance gain I would advice you to downgrade your VRTX to a Single PERC.

I am still figuring out the best way to downgrade from dual to single. The procedure described in my first post does work, but I am wondering if Daniel is right. Meaning that I only have to remove the controller + expander, re-cable and then reset the single controller. And that I do not need to downgrade the firmware.

And yes, please do contact Dell support! I have done this as well. I think the more people "complain" about this, the more serious they will take it.

Hi Erik,

Many thanks to your advice.

I just contact Dell support and wait for their reply

Some test results on our VRTX system for reference

0 Kudos

RE: write-through caching policy Fault Tolerant Shared PERC 8 big performance hit / very poor performance

Jump to solution

Erik

[quote user="Erik Nettekoven"]Note: by my understanding "write through" is not equal to write caching being disable, but it is just another (much slower) way of write caching.

That is not correct. Write through and write caching being disabled are the same. The controller will always run data through the cache, but with caching disabled or write through, which are the same thing, it will not queue anything in cache. It basically has a queue depth of 0 with no cache.
[/quote]

Thanks for clearing that up Smiley Happy


[quote user="Erik Nettekoven"]And yes, please do contact Dell support! I have done this as well. I think the more people "complain" about this, the more serious they will take it.

There is not an issue with the hardware. This is just an issue of understanding how pivotal of a role cache plays in RAID. Your write speeds with cache enabled are almost 2500 MB/sec. Your HDD's are not capable of writing that fast. Cache allows you to achieve speeds that the drives are not capable of. All you are showing in the above tests is how much effect cache has on RAID performance.

I have run your benchmark on an R720XD with an H710P controller to show you that this is not a hardware or firmware issue. This is simply how RAID works.

[/quote]

Regarding "there is not an issue with the hardware", yes and no.

Yes: In the respect of "that is how RAID works" than I agree it is indeed not an issue with the hardware.

No:  In the dual SPERC8 setup in the VRTX, which is also an hardware setup, you are bound to "write through" or "caching disabled". According to me this is a very poor choice, because of the huge impact on the write performance. It is in no way a suitable setup for a virtual environment, although it was sold to us by Dell "being a very suitable solution" for a virtual environment. The problems mentioned in my starting post and the benchmark results proof my "poor choice" statement.

If this "write through" caching policy is the only technical possible option with  Dual SPERC controllers, making -running a virtual environment unfeasible/impossible-, than I feel like I have been more or less scammed by Dell. Or at least Dell didn't live up to the expectations that they gave me.

So for now I stick with my advice/statement::
If you plan to run any virtual environment on a VRTX and its internal storage, stick with a Single SPERC setup, stay far away from the Dual SPERC setup.

Furthermore I am still wondering: what kind of environment is actually suitable for this Dual SPERC setup?

0 Kudos

RE: write-through caching policy Fault Tolerant Shared PERC 8 big performance hit / very poor performance

Jump to solution

I have contacted Dell Support and they recognize the performance problem with the Dual PERC controller setup. According Dell support the specialists are currently looking into this performance issue.

Furthermore support will sent me an email with troubleshooting steps to improve the performance of the Dual PERC setup and/or to find out if "write through" is the actual cause of this problem.

So everybody experiencing this problem, please DO contact support!

0 Kudos