What is '% Cache WP'

Question

Hello all,

We are doing some load testing in a new environment we are building out and I noticed that as we were loading up the system out "% Cache WP" eventually hit 100% and Avg Write Response time shot up. I can only assume that "% Cache WP" is something with write caching, but google has failed to turn up any meaningful definitions for me.

Attached is a screen shot from our load testing.

Thank you kindly,

Steven

Quincy561 · Accepted Answer

Yes, it is the amount of cache that is written, but not yet destaged

Quincy561 · Answer

Every storage array that has write cache, has a maximum amount of cache that can be written to and not yet destaged to disks.

When you reach the limit, you cannot write another track without destaging something, which is why your write times increased.

Generally, you reach this limit because you are writing faster than the backend can destage.

You can speed up the destage by adding more drives, more DAs, or changing the protection of the devices.

smaiello · Answer

Thank you Quincy, so essentially Cache WP is just a write cache plain and simple?

Fenglin1 · Answer

Boom shared a good table in thread Re: Symmetrix performance which provide detail description about Performance Metrix for your reference. Metric Description Host IOs/sec Number of host IO operations performed each second   by all Symmetrix devices, including writes and random and sequential reads. Host Reads/sec Number of host read operations performed each second   by all Symmetrix devices. Host Writes/sec Number of host write operations performed each second   by all Symmetrix devices. % Reads Percent of total read IO operations performed each   second by all of the Symmetrix devices. % Writes Percent of total write IO operations performed by   all of the Symmetrix devices. % Hit Percent of IO operations performed by all of the   Symmetrix devices, for which the read data was in cache and the write   operation could be sent directly to cache without having to wait for data   to be destaged from cache to the disks. Host MBs/sec The number of host MBs written and read by all of   the Symmetrix devices each second. Host MBs Read/sec The number of host MBs read by all of the Symmetrix   devices each second. Host MBs Written/sec The number of host MBs written by all of the Symmetrix   devices each second. FE Reqs/sec Data transfer between the director and the cache.   An IO may require multiple requests depending on IO size, alignment, or   both. The requests rate should be either equal to or greater than the   IO rate. FE Read Reqs/sec A read data transfer between the director and the   cache. An IO may require multiple requests depending on IO size, alignment,   or both. The requests rate should be either equal to or greater than the   IO rate. FE Write Reqs/sec A write data transfer between the director and the   cache. An IO may require multiple requests depending on IO size, alignment,   or both. The requests rate should be either equal to or greater than the   IO rate. BE IOs/sec Total IO from all BE directors to the disks per second. BE Reqs/sec A data transfer of a read or write between the cache   and the director. BE Read Reqs/sec A data transfer of a read between the cache and the   director. BE Write Reqs/sec A data transfer of a write between the cache and   the director. System WP Events/sec Number of times each second that write activity was   heavy enough to use up the system limit set for write tracks occupying   cache. When the limit is reached, writes are deferred until data in cache   is written to disk. Device WP Events/sec Number of times each second that the write-pending   limit for a specific Symmetrix device was reached. WP Count Number of system cache slots that are write pending. System Max WP limit The percent of the target % at which writes are delayed.   The range is from 40% to 80%. % Cache WP Percent of system cache that is write pending. Avg Fall Thru Time Average time it takes a cache slot in LRU0 to be   freed up. It is the average time from the first use of the contents to   its reuse by another address. FE Hit Req/sec Total requests from all FE directors per second that   were satisfied from cache. FE Read Hit Reqs/sec Total read requests from all FE directors per second   that were satisfied from cache. FE Write Hit Reqs/sec Total write requests from all FE directors per second   that were satisfied from cache. Prefetched Tracks/sec Tracks per second pre-fetched from disk to cache   upon detection of a sequential read stream. Destaged Tracks/sec Tracks per second saved into disks. FE Read Miss Reqs/sec Total read requests from all FE directors per second   that were misses. A miss occurs when the requested read data is not found   in cache. FE Write Miss Reqs/sec Total write requests from all FE directors per second   that were misses.  A miss occurs when the write   operation had to wait while data was destaged from cache to the disks.

cincystorage · Answer

Just wanted to expand on this a little bit.. You've got a "Write Pending Count" which is the number of cache slots containing Writes which are waiting to be de-staged to disk.

Next you've got a "Max Write Pending Threshold" which is the number the write pending count much reach before the write activity for the device increases in priority and, at max, Write Misses start to happen. It's not a static number at all, it's dependent on the activity on the Symmetrix. Each device is assigned a limit of write pending slots that can dynamically change between a base value and a value three times the base. Once the max write pending threshold has reached three times the base value, writes to the device are delayed so that the cache can destage slots for that device, which frees the cache slots. As cache slots are freed, the writes begin again. While the Write Pending Limit is reached, disk directors operate in a priority destage write mode, giving write data higher priority than usual (write misses).

The "% WP Limit" representation of these two statistics..

Quincy561 · Answer

Mark, the device WP limit hasn't been dynamic since DMX3. It is 5% of writable cache, or the size of the device if the device is smaller than 5%.

The DAs go in to high priority mode when we get above 50% of the WP limits.

cincystorage · Answer

Quincy,  you know for some reason I thought we were talking about a DMX3... Guess that happens then you are sleepy and not paying attention...  Thanks for pointing that out..

Nagraj_K · Answer

Hi Quincy,Have a small query on the % Cache WP.When we say that 50 % Cache WP is reached,does this mean that out of the Total available Cache (Read+Write) the Cache available for Writes has reached 50% of the available or is it the sum of the read cache+ Write Cache that is 50% of the total available Cache?Please go through the below stats which were taken from my VMAX.Total Cache configured in VMAX is 1024 GB. (128*8)(Mirrored)Total Cache mirrored&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; = 481 GB (512-481) ~ 32 GB (2 GB taken from every board by the Enginuity for its internal use *** as per an update from EMC in their Documentation)) Usable cache&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; = 370 GB (6064448 * 64 KB) (This is the cache that can be used for host reads & writes)Max WP cache&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; = 277 GB (4548336 * 64 KB) (This is the amount of Cache that can be used for&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; Writes and&#xa0; beyond this the host write experiences high response times)Max cache used for SRDF&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; = 61% (277 GB) = 169 GB&#xa0;&#xa0;&#xa0; (Reaching this limit the Group with the more Cache slots usage gets suspended)Usable Cache for the Reads&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; = 93 GB (370-277GB)&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; (This is the Cache that can be used for the Host reads) (481 -370) = 111 GB is used for Common Area + Device ID Tables. Am i correct with respect to the stats?===================================================================================================Stats reg the Cache:System Write Pending Limit&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; : 4548336&#xa0; (277.61 GB)Cache Slots available for all SRDF/A sessions&#xa0;&#xa0;&#xa0;&#xa0; : 2774484&#xa0; (169.34 GB)Total Local Write Pending Count&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; : 253976Total System Write Pending Count&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; : 475473Total System SRDF/A DSE Used Tracks&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; : 0&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; (0.00 GB)Total System Cache Slots In Use&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; : 391352&#xa0;&#xa0; (23.89 GB)Total System Cache Slots Shared&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; : 0&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; (0.00 GB)Total System Cache Full Percentage&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; :&#xa0;&#xa0; 8.0&#xa0;&#xa0;&#xa0;&#xa0; =&#xa0; (Used Cache Slots/Total Cache Slots)*100 = (391352/4548336)*100 = 8&#xa0; ======================================================================================================Cache Size (Mirrored)&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; : 481280 (MB) = 470 GB# of Available Cache Slots&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; : 6064448&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; = 370 GBMax # of System Write Pending Slots&#xa0; : 4548336&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; = 277 GBMax # of DA Write Pending Slots&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; :&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; 0Max # of Device Write Pending Slots&#xa0; :&#xa0; 227416Replication Cache Usage (Percent)&#xa0;&#xa0;&#xa0; :&#xa0;&#xa0;&#xa0;&#xa0;&#xa0;&#xa0; 0SRDF/A Maximum Cache Usage (Percent) : 61 1) I was told that the Max System Write Pending slots (277 GB) are the ones that are used for Writes done by the Host.So when a write request comes to the Cache I know that it sits in the Cache Slot and is marked as WP (both Local&#xa0; and Remote WP).The SRDF switching happens for every 15 sec,So the data that is coming in will wait in the cache for 15 sec (cycle time ) and then will be switched to the R2 Cache and then the WP bit is cleared or the track is de-staged to the Disk and it no longer exists in the Cache but marked as Remote WP in the Device ID table ? 2) ) Out of the 277 GB Cache available for Writes we have SRDF/A maximum Cache Usage at 61% which is 169 GB.So whenever our SRDF/A activity reaches 61% of 277 GB we are experiencing a suspension. What is the other 39% of the Write Cache used for ? Apart from SRDF there is no other activity that is run on the system by us like Virtual LUN VP,DP,FAST.3) Is there any separation like SRDF Cache and the Cache used for Host Writes? I know that whenever a write comes it is stored in the slot and marked as WP. So does this track a part of the SRDF Cache (61%) or the rest of the cache (39%) ?I want to know&#xa0; the above to investigate the SRDF suspensions happening at 10:30 on my VMAX.I have a situation here where my % Cache WP reaches 70-80 and multiple SRDF Groups suspends.All these I believed that the % Cache WP means the data that is in cache that is wriiten by the host but not yet de-staged to the Disks and i was investigating on the LUNS that are writing more during that period.But when we had a performance work shop i was told that Analyst that the % Cache WP consists of the Read Cache and the Write Cache.If that is the case i believe that the term % Cache WP doesn't make sense at all.Now we are in a fix if the Reads also needs to be investigated..!!!! we have backup windows starting at 6:00 in the evening till 6:00 in the morning.We have Virtual Environment with 10-12 ESX Clusters in our environment&#xa0; and in some Clusters we have atleast 60 LUNs..Jus need some clarity on the above termsThanksNagaraju K

Quincy561 · Answer

One correction. 100% of user cache can be used for reads.  It isn't total cache less the limit for write data.

Nagraj_K · Answer

if i have configured 1 TB of Cache on the VMAX what is the actual amount of the Usable cache i get ?

Quincy561 · Answer

That will depend on the useable capacity configured in the system. Three is a cache calculator available to EMC folks that can give you an estimate based on the types of volumes and the capacity you are planning to install in the system.

It sounds like you need a performance analysis done to understand why your writes are not clearing faster. Could be a backend issue, or an issue with your links.

Nagraj_K · Answer

Thanks for the update Quincy...The performance analysis has been done on the VMAX for twice now.Everytime EMC does this they give some suggestions like make the RA ports Unidirectional,decrease the work load n etc....

We are looking at the balancing of the work load currently but am stuck at a point to decide on whether i should concentrate on reads or Writes to get this fixed..

If Cache is occupied by the reads then the title % Cache WP (WRITE PENDING) is not the right one as it mis guides af it is the due to the Host Writes.It can be % Cache FULL and that makes some sense.

To resolve something u need to understand first how it is working right ?

Can you please answer the 1st and 2nd Questions.

Nagraj_K · Answer

Thanks for the reply Richotte..

I thought that as an option but my fear was that my % Cache WP is going till 90 and at times even 98%.

So if i increase the Max SRDF Usable Cache does that mean it is taking the available Cache slots to be SRDF making it difiicult for the Host Reads and Writes.

Am i correct ?

Does restoring the value to the 74 have any other adverse effect on the rest of the Cache that is available for Read/Write Operations.?

Nagraj_K · Answer

Below shown is the Status of the RDF usable cache before and during the SRDF groups suspension issue.As the title says that

SRDF/A Session Cache Summary i believe that this section gives the information relating to the SRDF/A Cache usage and it getting occupied to 61% should mean that this amount of cache is used for replicating the Data which means that it is new write or update to an existing track on VMAX.

So this means that i am getting a lot of Writes..Am I correct?

I agree to the fact that 100% of the cache can be used for reads,but the SRDF cache shown below will be used only for holding the data that needs to be replicated to the other site.

So this getting filled means that the system is experiencing a lot of Write activity at the time correct ?

But the Performance analyst still argues that the system is experiencing the suspension due to Reads but not the Writes and says that Writes can wait in the Cache for some time as the reads will always be the priority.

SRDF Jus Before Suspension.jpg

In SRDF/A mode the data is temporarily held in the cache until the cycle switching,But how exactly does the Adaptive Copy transfer the data from the Primary system to the Secondary system.

When a SRDFgroup operates in the Adap Copy Disk Mode the writes are aknowledged as if they are writes to the Local Devices and then transferred at later point over the Links is what EMC document says.

So how exactly does the replication of a track takes place when operating the Acp_Disk Mode ?

Can this by pass the primary cache before getting replicated to the Secondary Cache ?

Nagraj_K · Answer

Thanks Richotte for your patience in answering all my questions and i really appreciate your help and thanks for the information you have shared. Regards Nagaraju K

VMAX

What is "% Cache WP"

Was this post helpful?

Metric	Description
Host IOs/sec	Number of host IO operations performed each second by all Symmetrix devices, including writes and random and sequential reads.
Host Reads/sec	Number of host read operations performed each second by all Symmetrix devices.
Host Writes/sec	Number of host write operations performed each second by all Symmetrix devices.
% Reads	Percent of total read IO operations performed each second by all of the Symmetrix devices.
% Writes	Percent of total write IO operations performed by all of the Symmetrix devices.
% Hit	Percent of IO operations performed by all of the Symmetrix devices, for which the read data was in cache and the write operation could be sent directly to cache without having to wait for data to be destaged from cache to the disks.
Host MBs/sec	The number of host MBs written and read by all of the Symmetrix devices each second.
Host MBs Read/sec	The number of host MBs read by all of the Symmetrix devices each second.
Host MBs Written/sec	The number of host MBs written by all of the Symmetrix devices each second.
FE Reqs/sec	Data transfer between the director and the cache. An IO may require multiple requests depending on IO size, alignment, or both. The requests rate should be either equal to or greater than the IO rate.
FE Read Reqs/sec	A read data transfer between the director and the cache. An IO may require multiple requests depending on IO size, alignment, or both. The requests rate should be either equal to or greater than the IO rate.
FE Write Reqs/sec	A write data transfer between the director and the cache. An IO may require multiple requests depending on IO size, alignment, or both. The requests rate should be either equal to or greater than the IO rate.
BE IOs/sec	Total IO from all BE directors to the disks per second.
BE Reqs/sec	A data transfer of a read or write between the cache and the director.
BE Read Reqs/sec	A data transfer of a read between the cache and the director.
BE Write Reqs/sec	A data transfer of a write between the cache and the director.
System WP Events/sec	Number of times each second that write activity was heavy enough to use up the system limit set for write tracks occupying cache. When the limit is reached, writes are deferred until data in cache is written to disk.
Device WP Events/sec	Number of times each second that the write-pending limit for a specific Symmetrix device was reached.
WP Count	Number of system cache slots that are write pending.
System Max WP limit	The percent of the target % at which writes are delayed. The range is from 40% to 80%.
% Cache WP	Percent of system cache that is write pending.
Avg Fall Thru Time	Average time it takes a cache slot in LRU0 to be freed up. It is the average time from the first use of the contents to its reuse by another address.
FE Hit Req/sec	Total requests from all FE directors per second that were satisfied from cache.
FE Read Hit Reqs/sec	Total read requests from all FE directors per second that were satisfied from cache.
FE Write Hit Reqs/sec	Total write requests from all FE directors per second that were satisfied from cache.
Prefetched Tracks/sec	Tracks per second pre-fetched from disk to cache upon detection of a sequential read stream.
Destaged Tracks/sec	Tracks per second saved into disks.
FE Read Miss Reqs/sec	Total read requests from all FE directors per second that were misses. A miss occurs when the requested read data is not found in cache.
FE Write Miss Reqs/sec	Total write requests from all FE directors per second that were misses. A miss occurs when the write operation had to wait while data was destaged from cache to the disks.