Unsolved

This post is more than 5 years old

9 Legend

 • 

20.4K Posts

5548

May 13th, 2012 15:00

Fast Cache - Impact on Random Write performance

I have current application XenServer running on DMX4 with the following characteristics:

Write Size - 12k

Write Ratio - 90%

Random - 90%

This workload will be moved to brand new VNX5700 (flare 5.31.000.5.709 ), no other hosts connected, current config

RAID 1/0  Pool

160 x 300G 15K SAS drives

Fast Cache

4 x 200G EFDs

So before we put this box in production i decided to run some tests and see how Fast Cache does with this type of workload. I connected one Windows 2008 R2 server, presented 2 x 5G LUNs (one on each SP) and configured the following in IOmeter ( 5 workers per LUN).

5-13-2012 6-10-43 PM.png

My first test was to run for 24 hours with Fast Cache disabled at the pool level, as you can see from the picture i was getting about 44k IOPS. Next test was to go ahead and enable Fast Cache on the pool, which i did around 1am. The minute i did that total IOPS went down more than half. I was hoping that once it "warmed up" it would get better, not so much. As you can see it's staying consistent for many hours now.

5-13-2012 5-59-26 PM.png

Looking at Fast Cache stats at the pool level, nothing there

5-13-2012 6-17-12 PM.png

My questions are :

1) If this type of workload is not Fast Cache friendly, why such a huge impact on IOPS

2) Is Fast Cache promoting something but it does not get re-used ? If that's the case why am i not seeing it in Fast Cache stats for the pool

3) What else should i look at to determine such a drastic decrease in performance with Fast Cache is enabled.

Thanks

9 Posts

May 13th, 2012 18:00

It would be interesting to see if your getting any forced flushing due to the reduction in DRAM.

9 Legend

 • 

20.4K Posts

May 13th, 2012 19:00

yep, SPA is hovering around 99%

5-13-2012 10-35-42 PM.png

4 Operator

 • 

8.6K Posts

May 14th, 2012 01:00

If you are using XenServer 5 (Boston) make sure you have the latest service pack / patches installed.

We've seen some weirdness in the I/O behaviour in the initial releases

Rainer

4 Operator

 • 

8.6K Posts

May 14th, 2012 01:00

Strebel wrote:

It would be interesting to see if your getting any forced flushing due to the reduction in DRAM.

If you are you should look at adapting the watermarks

9 Legend

 • 

20.4K Posts

May 14th, 2012 03:00

thanks Rainer, customer is still on 4.x . What i am doing here is generating workload with IOmeter so virtualization platform is not in the picture.

9 Legend

 • 

20.4K Posts

May 14th, 2012 06:00

Baif,

i have specific application and specific workload that i need to address, i don't have time to play "what if's". Sorry

May 14th, 2012 06:00

LOL.

I think FAST VP with Tier0 is the best way absobing this kind of random write in this size of write size...

Do you think you can run different tests on it. Really wanna see how FAST Cache works with different type of apps/filesystems.

You may check your NTFS volume setttings.

  • IOzone NTFS, SQLIO with NTFS with different block settings.
  • And I wanna know how does FAST Cache work with EXT4/XFS on Linux, with IOMeter/IOZone with different block setttings.

38 Posts

May 14th, 2012 07:00

The test area is small and you're comparing FAST cache on performance with 100% cache hit performance.

I'll explain;

2 * 5GB LUNs so all writes will be cache hits and just stay in cache (eventually as his reads should be hitting that data at some point after the start of the test).

Any I/O causing triggers to promote to FAST cache would expect to see a benefit only if you were disk bound and with 10GB total test area, you'll never see that.

Your logic dictates that 4 drives in FAST Cache should be able to boost the load, but that equates to 20K+ IOPs per drive (flash drives won't do 20K+ in predominant write profiles).

So the underlying issue is that you're comparing FAST cache affected load to cache only hits and with promotion logic doing what it does, you can only ever show a negative effect with that test.

To understand and realise FAST cache benefits you must have disk bound activity i.e. write cache flushing and disk utilization occurring. SP DRAM Cache is inherently faster than SSD's.

Another way of looking at it;

You have a 180 disk pool of R1.

you have 10GB of user space that should equate to 10 slices, each serviced from a 4+4 RG, thus 80 drives (I assume we understand pool slice allocation i.e. pools comprise private RG's, etc).

So now that 45K IOPs, if it really were serviced by 80 drives would work out like this;

45K at 90% random and 90% writes = 36,450 writes = 72,900 disk IOPs (R1).

Add the 10% random reads of 4,050 = 76,950.

So if disk bound, we’d expect nearly 1000 IOPs per drive in your pool for those that are active.

To conclude – your test area is small and you're comparing FAST cache on performance with 100% SP write cache hit performance.

I do not know why your FAST cache stats for the pool are showing zero. Would like to see the nar files to see if any of your physical disks in the pool are doing anything other than sniffer…. And so on.

38 Posts

May 14th, 2012 07:00

Just to add -- I am expecting your small test size is resulting in write cache rehits of 100% for those LUNs. That would also show read hits being high as well as those will be serviced from the data in write cache.

I hope that helps.

May 14th, 2012 08:00

When you are running out of SP cache.

May 14th, 2012 08:00

No worries, hopefully ksnell resolves your concerns and FAST VP helps in long term.

9 Legend

 • 

20.4K Posts

May 14th, 2012 08:00

ksnell wrote:

So the underlying issue is that you're comparing FAST cache affected load to cache only hits and with promotion logic doing what it does, you can only ever show a negative effect with that test.

So if my workload is so small and everythign is serviced by DRAM, how's enabling Fast Cache decreases my performance in half ?

Thanks

9 Legend

 • 

20.4K Posts

May 14th, 2012 08:00

where did you get FAST-VP from ?

4 Operator

 • 

8.6K Posts

May 14th, 2012 12:00

That really depends on your access locality –

Fast VP doesn’t relocate very frequently – often customers set it to just during the night whereas Fast Cache reacts quicker

Rainer

9 Legend

 • 

20.4K Posts

May 14th, 2012 12:00

15 hours to "warm up" should be long enough ..don't you think ?

This is not artificial workload, this workload currently exists on DMX4, stats are gathered from SPA (Symmetrix Performance Analyzer).  Trust me, i am not making this up.

No Events found!

Top