Start a Conversation

This post is more than 5 years old

Solved!

Go to Solution

1336225

June 10th, 2014 06:00

Performance issue / poor performance: VRTX with Fault Tolerant / Dual Shared PERC 8

Hi,

I have been struggling with a storage performance issue for a few months. I found out that the culprit in this performance hit, is the Fault Tolerant Shared PERC 8 in the VRTX. In this configuration you are bound to the write-through caching policy, which is currently performing very bad.

Because of this big performance hit on the storage (caused by the caching policy) I had the following problems:

  • Poor performance of (HyperV) VM's
  • Poor copy performance (copying big files within the same volume 70MB/s max)
  • Unable to make a backup of the VM's using either Veeam B&R 7 or Backup Exec 2014 RTM (backup and VSS are related)
  • VSS time-out errors (0x8000ffff- Catastrophic failure)
  • Checkpoint creation of a VM took about 2 minutes

The release notes of the latest firmware (23.8.12-0061) for the SPERC 8 state:

Firmware supports VRTX running with single SPERC8 controller;
- Write-back cache is enabled for VDs.
- Does not have high availability capability

Firmware supports VRTX running with dual SPERC8 controller;
- High availability capability present (Both SPERC8 needs to be connected to expanders)
- One of the controllers is active and the other is passive
- Write-back cache is disabled for VDs; only option for write cache is Write Through.


To downgrade from a dual controller to a single controller, I removed the second SPERC8 controller, removed the second expander, re-cabled the SAS connections and downgraded the chassis to 1.25. Just removing the controller and expander wasn't enough, because then I still was unable to set "write-back" as caching policy. As soon as I downgraded to 1.25 I was able to set write-back as caching policy.



With the single controller configuration, the storage performance is as it should be. All the earlier mentioned problems are gone:

  • (HyperV) VM's perform excellent
  • copy performance is excellent (copying big files within the same volume around 650MB/s max)
  • Backup actually works (backup and VSS are related)
  • No more VSS time-out errors (0x8000ffff- Catastrophic failure)
  • Checkpoint creation of a VM took about 4 seconds

From this experience I currently strongly advise to not go for the dual SPERC8 configuration and stick with the single configuration. Hopefully a firmware update will solve this performance issue, because at the moment having the dual / fault tolerant SPERC 8 configuration with this very poor storage performance makes no sense at all. Yeah, you're fault tolerant... but no backup and bad performing VM's aren't really a good option in a production environment I think...

Or might there be an essential setting I am missing for the write through setup? (I have tried different settings already...)

Below are the benchmarks made using Atto. The difference is tremendous! Funny fact: while doing the benchmark with the write back policy enabled, the virtual disk was still in its background initialization  process, and with the write through it was not. 

NOTE: Both VD's were configured as RAID6

April 15th, 2015 10:00

Not sure what your remaining issues are (after the latest firmware updates from a few months ago), Erik, but we've had a half a dozen of these running in somewhat hostile environments (temperature, vibration, tilt, power cuts on half of the PSUs) and no failures so far. Longevity is still to be determined, of course.

Running them with both raid controllers with recent firmwares hasn't given us any headaches so far. We even got some scripts from Dell to simulate a failed controller to see if failover worked as intended, and it did in our tests. Performance is not stellar (hell, we're running RAID6 mostly), but certainly not troublesome so far. Our tests showed greatly increased performance with the dual controllers and write-back. Performance was within expected parameters for us (hence the choice to even go with RAID 6, since it was "good enough"). We are obviously not running high performance database or HDD intensive processing on this though. You also might not want to use write-back if you don't have redundant power and/or UPS backing.

What has proved most annoying is the phasing out of the M520 blades (there will not be an M530). Dell only offers the M630 as an alternative, but it's 10% more expensive. That's on top of the expected 5-7% price increase, we're looking at swallowing due to the weak euro. That's eating my budgets for future planned deployments.

1 Message

April 15th, 2015 17:00

Hi Erik,

did you complete your testing with the latest FW? Does it work now as expected?

26 Posts

April 15th, 2015 18:00

Here is my update on my situation. There IS an issue with the 23.8.12 and 23.11.16 PERC 8 firmwares. Under certain load conditions, the blades will consistently lose connectivity to the storage due to insane latenly spikes. We had this happen on two of our VRTX systems running vsphere that are used for dev/test.

We worked closely with Dell, including running a custom firmware with debugging flags enabled. Eventually they were able to reproduce the issue we were seeing on their labs. The result of that engagement was a new firmware which is currently available that, per Dell, addresses the issue. That firmware version is 23.11.46. We are currently in the process of upgrading all of our 16 VRTX systems.

1 Message

July 24th, 2015 03:00

Hi Erik,


I am just wondering, did the firmware update help solve the issue?

28 Posts

July 26th, 2017 09:00

Can anyone tell me if the issues with the dual perc8's has been resolved?

I am seeing this presently with the latest firmware installed across the chassis and (3) M620 blades.

28 Posts

July 26th, 2017 10:00

Above is how the drives are configured, and I have to say the performance is less than desired!

As you can see with the firmware installed with both PERC8's enabled I can use Write-Thru, Write-Back and Force Write-Back, but both Force and standard write-back have no benefit. write performance is dog slow!

No Events found!

Top