Start a Conversation

This post is more than 5 years old

Solved!

Go to Solution

1824

February 25th, 2010 05:00

HA Cache Vault

Is there any other scenario (for CX and CX3), besides a total power failure, when the write cache is dumped to the Vault drives and not to desired LUN?

I'm trying to evaluate if disabling HA Cache Vault option is so critical. The Vault area is shared by both SPs and striped through all 5 FLARE drives, so once one drive fails, the whole area becomes unaccessible. Right?

The EMC CLARiiON Best practices doc says that "clearing this selection exposes the user to the possibility of data loss in triple-fault situation: Vault drive fails, no dump. If another Vault drive fails and the array experiences a total power failure, the content of cache will be lost". I agree, but what happens if i leave the HA Cache Vault on (as defined by default), my array experiences a total power failure, cache is dumped and during it i loose a Vault Disk, or two? I think the result will be the same - data loss. Is it correct?

Thank you

61 Posts

February 25th, 2010 15:00

Hello Alec,

This is a tough concept to explain so people can visualize, but I will take a shot.  The "vault" is RAID5 protected across the 5 system drives in a CX and CX3 series systems.  As such, if a single drive fails, the vault is still accessible for both Reads and Writes.  If two a second system drive fails before the first is replaced and the vault is rebuilt, then it will become inaccessible.

Write caching allows the CLARiiON to acknowledge host writes after putting them in write cache but before putting them onto persistent storage.  Because the memory is volatile (DRAM) there needs to be some protection against a complete power failure so that data in the DRAM is not lost.  That is what the SPS provides, the storage processors will stay online long enough during a power failure to "dump" all data in the write cache to the vault.  Therefore, no data will be lost in that failure mode.

As you can see, there are two critical components to keeping data in your write cache safe, the SPS and the vault.  Each of those systems is additionally redundant.  All CX and CX3 systems can function with two SPS, or one.  One SPS can keep the array up long enough to dump write cache, two allows write caching to continue if the other SPS becomes faulted.  A RAID 0 set would provide protection against a power failure, but not a power failure and a drive failure.  So by default we require RAID 5 (HA Cache Vault), if a single drive fails then write cache is disabled.   If the customer wishes to disable HA Cache vault, then write cache will remain enabled even if on of those drives is faulted, leaving the cache in what is effectively a RAID 0 set of 4 drives. 

Envision a couple scenarios

Scenario 1:   If HA cache vault is disabled, and a single system drive fails, and there is a power failure; the write cache will still be safely dumped to the vault. a. When AC power is restored, if all 4 remaining system drives come up okay, then the write cache will be moved from the vault to SP DRAM.  b. When AC power is restored, if a system drive fails to come up, or if it fails before all the write cache data is moved from the vault to the SP DRAM; then the data still in the vault is lost.c. When AC power is restored, if 2 system drives fails to come up, or if 2 fail before all the write cache data is moved from the vault to the SP DRAM; then the data still in the vault is lost.

Scenario 2:   If HA cache vault is enabled, and there is a power failure; the write cache will still be safely dumped to the vault.  a. When AC power is restored, if all 5 remaining system drives come up okay, then the write cache will be moved from the vault to SP DRAM.  b. When AC power is restored, if a system drive fails to come up, or if it fails before all the write cache data is moved from the vault to the SP DRAM; then the data still in the vault will still be safely moved to the SP DRAMc. When AC power is restored, if 2 system drives fails to come up, or if 2 fail before all the write cache data is moved from the vault to the SP DRAM; then the data still in the vault is lost.

Scenario 2 is safer, but safety comes at a cost.  If a single system drive fails at any time with HA Cache Vault enabled, write cache will be disabled.  This can be impactful to the performance of the system in two ways. Obviously, writes will be slower with no write cache.  Less obvious, when the system drive fails write cache is dumped.  When the array dumps the cache to the vault, host IOs are serviced at a very low priority (if at all) so there will be a large delay in response time until the dump process is completed.

Choose wisely.

2.1K Posts

February 25th, 2010 10:00

My understanding is that the setting is not there to protect you specifically from a vault drive failure duringa power outage cache dump, but to ensure that a vault drive failure doesn't leave you vulnerable to a double fault in the vault drives in the event a power failure causes a dump.

Leaving this checked will disable the write caching if a single drive fails in the vault. The odds of having a double drive fault in the vault drives and a complete power failure at the same time are low enough that we don't bother with this setting. However, there are environments where even incredibly low odds are enough reason to take the potential performance hit during a vault drive failure and replacement. Unless your environment imposes those stringent requirements I would tend to leave this checkbox empty.

19 Posts

February 27th, 2010 22:00

Hello Bertog, thank you very much for the detailed answer.

What exactly was changed at CX4/FLARE28 to resolve that issue of write cache disable with single vault drive failure?

Thanks

Alec

No Events found!

Top