Start a Conversation

Unsolved

This post is more than 5 years old

2403

March 30th, 2012 10:00

VNX5100 Fast Cache slows down writes

I'm using a SQL script to test performance of my new server/array. All read data is already cached on the DB server. This script inserts rows to a table and then updates all the rows several times. When I enable Fast Cache for the lun, the run time doubles or more. I noticed via Analyzer collection statistics that when Fast Cache is disabled, the majority of writes are to write cache with a small percentage destaged to the disks. When Fast Cache is enabled, there is a small hit to write cache at the beginning and then all of the write hits are to Fast Cache. It appears the writes are bypassing write cache. Any ideas? I've repeated the test both ways numerous times and it's always the same.

337 Posts

May 1st, 2012 09:00

Hi,

This issue may require a detail investigation, you can collect the NAR files and open a ticket with EMC.

Regards,

Bashanta Phukon

57 Posts

May 3rd, 2012 03:00

Hello Stutz

I would suggest you to go through the 7TH chapter of " EMC solutions for Microsoft SQL server with EMC VNX Series "

This document will explain how the FAST can be used to get the optimum performance for SQL

   .

Thanks

Samir

2 Posts

May 3rd, 2012 04:00

Thanks for the respnoses. I guess my real question is: when Fastcache is enabled should writes to data in Fastcache bypass the write cache? According to EMC documentation the answer is no. I captured Analyzer data and it shows high write counts for Fastcache and very few for write cache when Fastcache is enabled for the lun. I opened a ticket and submitted the data. EMC has refused to help. I've been told since this is not yet in production this is services. I'm supposed to pay them to help me tune my system. I only want them to look at their data which I've submitted and answer the above question.

The dataset I'm working with is 200MB. It easily fits in write cache and this is seen by the write activity with Fastcache disabled. With Fastcache enabled, my database engine(Sybase) is exceeding it's maximum queued asyn IO's of 5000 and warnings are issued. Few IO's are queued with fastcache disabled.

I've read the manuals and whitepapers and understand the types of writes which are not good for Fastcache (for example database logs are not using Fastcache). The Fastcache is helping quite a bit with read activity. Unfortunately these are tables and I can't separate reads/writes. So what I want to know is:

A. Shouldn't writes to write cache always be at least as high as writes to Fastcache over a recorded interval in Analyzer.

B. If the answer to A is yes, is the data not being correctly recorded or is write cache actually being bypassed.

C. Isn't it just possible the array isn't working as defined.

Sure wish someone at EMC would look at the data I sent them. I figure it would take about 5 minutes.

(As a side note, I'm now being contacted to find out why I provided low marks on the survey after they closed my ticket without helping)

75 Posts

May 3rd, 2012 09:00

Since the write cache services all writes, unless disabled, your thought 'A' is correct, however, how analyzer presents it is not how you expect. The Analyzer stats are trying to show you the net of FAST Cache activity - it is presented as if FAST Cache were a cache layer in front of the write cache. What you will find is that

host (LUN) writes = FAST Cache hits + FAST Cache misses. It is a FAST Cache-centric analysis.

Any FAST Cache misses are then counted for in the DRAM write cache stats.l So,

rate of FAST Cache miss = Write cache misses (Force flush) + write cache hits

What is obscured is the fact that any writes to FAST Cache are actually handled via the write cache.

DRAM Write cache --> Flush --> to FAST Cache private container

The reason for your performance impact is not bypass of write cache. It is likely one of two things:

1) Small sequential writes (which are known to be problematic for FAST Cache). How to tell: if you look at your workload with FAST Cache disabled, you see a fairly high rate of Full Stripe Writes, which implies sequential writes to those tables. (Check disk write size, multiply by stripe width (e.g., for 4+1 disk write * 4) that gets your approximate Full Stripe Write dispatch size, multiply that by your FSW count for MB/s flushed via FSW. E.g.:

Disk writes: 128 KB * 4 (4+1) = 512 KB per flush

FSW = 8/sec so 8*512KB ~= 4 MB/s flushing via FSW.

If write rate is 5 MB/s then yu know you are doing 4/5 = 80% of writes are sequential.

2) Pathological locality. This is when we do the work to promote to FAST Cache, and don't get any benefit. Certain replication processes do this - they write once and read twice (3 hits, prmotion) and then never go back...so you end up spending cycles promoting with no benefit. That promotion activity has a cost, obviously, so if there is no benefit, you see a net slowdown.This has to be really pathological, over a long time (see point 3). How do you tell? See if the FCache promotions per second are steady over your test run...they should not be - they should drop as the cache stabilizes.

3) Too short of a test. It takes time for the FAST Cache to 'warm up' and a lot of folks are still testing as if testing DRAM cache. Testing for Fcache takes some adjustment, you have to run for 30 minutes for a small setup like this to an hour, depending on how random the access is, for those access counts to result in the right data getting into the cache and staying there.

No Events found!

Top