Start a Conversation

Unsolved

This post is more than 5 years old

K

1 Rookie

 • 

358 Posts

1009

March 18th, 2014 14:00

Can Fast Cache soften RAID6 write penalty?

Just trying to decide between RAID10 (16 drives plus 1 hot spare) or RAID6 (16 drives plus 1 hot spare).  But I also have 3 Flash drives to use for Fast Cache.  I could certainly use the extra storage space that RAID6 would give me for future growth, but I am concerned about the write penalty.  However if Fast Cache can mitigate this penalty then maybe its OK to use RAID6?

Performance and Reliability is what I'm after... hence not interested in RAID5 anymore.

I have 17 SAS 10k 600GB 2.5" drives, 17 2TB NL-SAS 3.5" drives and 3 200GB Fast Cache 2.5" drives.

From our performance analysis sent to EMC, they said the RAID6 plus fast cache would be about 1000 IOPS higher than our current NX4 Celerra's 95th percentile.

So I was considering two 16 drive arrays.. just which raid level (6 or 10), with one hot spare per drive type (hence the 17 drives for each).

OR, 4 8 drive arrays (two from SAS and two from NL-SAS).  Though I like the idea of a higher spindle count for performance.  I heard that it is very important not to break up the spindles. If you separate them then you have no means by which to leverage the combined IOPS and instead of getting the advantage of consolidation you are putting the IOPS into silos so that when one RAID pair is idle and another is busy they have no way to load balance. Think of it like your CPUs. You don't carve out the CPUs for just one VM and another CPU for another VM. You want all idle CPU power available to any VM that wants to use it so that you are getting maximum performance with minimum waste. With one big array, pretty much every disk operation is faster than when they are split. And because disk operations happen more quickly it also keeps queue lengths low making it even more likely that the next request will enter an empty queue. With split drives you are almost guaranteed to have some idle while others are getting beat to death. It also helps to spread out wear and tear to give you more time between hardware failures.

What do you guys think?

254 Posts

March 18th, 2014 15:00

Of course the answer is " it depends" and what it will depend upon most is your data skew. FAST cache promotes 64K chunks to flash, but it takes multiple hits of the 64 chunk to get it promoted. So when it's time to flush from write cache (which depending on the write load should help with the latency to the hosts) if the blocks to be flushed are going to chunks that are already promoted, then the flush will definitely be faster and the write penalty delayed until the chunk is destaged to disk. If not, no help. So the rate of re-use of those chunks will determine if FAST Cache will help.

1 Rookie

 • 

358 Posts

March 19th, 2014 08:00

Ok so basically if we are writing the same type of data frequently, this would help?  More like random writes here and there won't really be much of a help?

Maybe its better to just go RAID10 to guarantee better performance.  I just hate to lose so much storage space.  If I try to calculate RAW space, 16 disks of 600GB SAS is only 4.8 TB in RAID10, while its 8.4 TB in RAID6.  That's quite a different story.

I have 2TB NL SAS drives in the same situation but I am more prone to doing RAID10 there to help assist with the performance hit of slower RPM drives, and they are larger so I would still have 16 TB raw capacity in RAID10 which is more than we need now.  In my DAE's there are 22 x 2.5" empty slots and 13 x 3.5" for future growth.

However although we will be migrating all of production off of a Celerra NX4... we will still keep the NX4 but just use it for backup / replication and test machines.  So some space will still be able to be offloaded from the new VNX5200. 

I understand there is no one size fits all, hence the wide range of options in creating storage.

254 Posts

March 19th, 2014 15:00

Unless you are seeing a real issue, I wouldn't get scared by the RAID write penalty.  People run RAID-6 (especially with NL-SAS drives) all the time and don't have problems.  It will come down to a couple of issues:

1. Your r/w ratio.  The more writing you do, the more you'll be affected.  Lots of workloads out there have more reads than writes and reads don't hurt you.  Of course there are write-intensive workloads our there.

2. Write Cache. If your write cache is handling the writes coming in, then the hosts shouldn't notice any delays by the write penalty since they get acknowledged as soon as it hits write cache which is usually very quick (DRAM is faster than any disk drive, even flash), then data is de-staged later in the background.  So as long as you don't over-run write cache, you should never notice the penalty.  On the NX4, a performance problem would come from what's called "forced flushes" which is when the write cache goes over its high watermark On your new box, this should be even less of a problem due to the higher amounts of memory, and the OS improvements of the way cache is handled in MCx.  There are no more watermarks or watermark flushing as the system is smart enough to not allow a host to over-run the cache and will automatically pace the writes from that host down based on the drive bandwidth for the writes all in real time.  Also memory is dynamically allocated to read cache or write cache as needed in real-time based on your workload.

3. Big sequential full-stripe writes.  If you are doing big sequential writes to new stripes, you won't notice the RAID write penalty because the parity can be pre-calculated and written out to the stripe.  The penalty comes into play when you have to insert writes into an existing stripe because I have to read in the existing data, re-calculate parity based on the new stripe, then write it back out.  But this will depend on workload as well as how many empty stripes you have.

There is always the option to run RAID-10, but I would only do that if I knew the workload couldn't handle RAID-6 even with write cache.  Your EMC SE should be able to help you see the impact based on your real workload, especially since you have data on an existing NX4 today, and then model that workload onto a 5200 to see what the effect should be.

Hope this helps.

No Events found!

Top