Boom shared a good table in thread Re: Symmetrix performance which provide detail description about Performance Metrix for your reference.
Metric
Description
Host IOs/sec
Number of host IO operations performed each second by all Symmetrix devices, including writes and random and sequential reads.
Host Reads/sec
Number of host read operations performed each second by all Symmetrix devices.
Host Writes/sec
Number of host write operations performed each second by all Symmetrix devices.
% Reads
Percent of total read IO operations performed each second by all of the Symmetrix devices.
% Writes
Percent of total write IO operations performed by all of the Symmetrix devices.
% Hit
Percent of IO operations performed by all of the Symmetrix devices, for which the read data was in cache and the write operation could be sent directly to cache without having to wait for data to be destaged from cache to the disks.
Host MBs/sec
The number of host MBs written and read by all of the Symmetrix devices each second.
Host MBs Read/sec
The number of host MBs read by all of the Symmetrix devices each second.
Host MBs Written/sec
The number of host MBs written by all of the Symmetrix devices each second.
FE Reqs/sec
Data transfer between the director and the cache. An IO may require multiple requests depending on IO size, alignment, or both. The requests rate should be either equal to or greater than the IO rate.
FE Read Reqs/sec
A read data transfer between the director and the cache. An IO may require multiple requests depending on IO size, alignment, or both. The requests rate should be either equal to or greater than the IO rate.
FE Write Reqs/sec
A write data transfer between the director and the cache. An IO may require multiple requests depending on IO size, alignment, or both. The requests rate should be either equal to or greater than the IO rate.
BE IOs/sec
Total IO from all BE directors to the disks per second.
BE Reqs/sec
A data transfer of a read or write between the cache and the director.
BE Read Reqs/sec
A data transfer of a read between the cache and the director.
BE Write Reqs/sec
A data transfer of a write between the cache and the director.
System WP Events/sec
Number of times each second that write activity was heavy enough to use up the system limit set for write tracks occupying cache. When the limit is reached, writes are deferred until data in cache is written to disk.
Device WP Events/sec
Number of times each second that the write-pending limit for a specific Symmetrix device was reached.
WP Count
Number of system cache slots that are write pending.
System Max WP limit
The percent of the target % at which writes are delayed. The range is from 40% to 80%.
% Cache WP
Percent of system cache that is write pending.
Avg Fall Thru Time
Average time it takes a cache slot in LRU0 to be freed up. It is the average time from the first use of the contents to its reuse by another address.
FE Hit Req/sec
Total requests from all FE directors per second that were satisfied from cache.
FE Read Hit Reqs/sec
Total read requests from all FE directors per second that were satisfied from cache.
FE Write Hit Reqs/sec
Total write requests from all FE directors per second that were satisfied from cache.
Prefetched Tracks/sec
Tracks per second pre-fetched from disk to cache upon detection of a sequential read stream.
Destaged Tracks/sec
Tracks per second saved into disks.
FE Read Miss Reqs/sec
Total read requests from all FE directors per second that were misses. A miss occurs when the requested read data is not found in cache.
FE Write Miss Reqs/sec
Total write requests from all FE directors per second that were misses. A miss occurs when the write operation had to wait while data was destaged from cache to the disks.
Just wanted to expand on this a little bit.. You've got a "Write Pending Count" which is the number of cache slots containing Writes which are waiting to be de-staged to disk.
Next you've got a "Max Write Pending Threshold" which is the number the write pending count much reach before the write activity for the device increases in priority and, at max, Write Misses start to happen. It's not a static number at all, it's dependent on the activity on the Symmetrix. Each device is assigned a limit of write pending slots that can dynamically change between a base value and a value three times the base. Once the max write pending threshold has reached three times the base value, writes to the device are delayed so that the cache can destage slots for that device, which frees the cache slots. As cache slots are freed, the writes begin again. While the Write Pending Limit is reached, disk directors operate in a priority destage write mode, giving write data higher priority than usual (write misses).
The "% WP Limit" representation of these two statistics..
Quincy, you know for some reason I thought we were talking about a DMX3... Guess that happens then you are sleepy and not paying attention... Thanks for pointing that out..
When we say that 50 % Cache WP is reached,does this mean that out of the Total available Cache (Read+Write) the Cache available for Writes has reached 50% of the available or is it the sum of the read cache+ Write Cache that is 50% of the total available Cache?
Please go through the below stats which were taken from my VMAX.
Total Cache configured in VMAX is 1024 GB. (128*8)(Mirrored)
Total Cache mirrored = 481 GB (512-481) ~ 32 GB (2 GB taken from every board by the Enginuity for its internal use *** as per an update from EMC in their Documentation))
Usable cache = 370 GB (6064448 * 64 KB) (This is the cache that can be used for host reads & writes)
Max WP cache = 277 GB (4548336 * 64 KB) (This is the amount of Cache that can be used for Writes and beyond this the host write experiences high response times)
Max cache used for SRDF = 61% (277 GB) = 169 GB (Reaching this limit the Group with the more Cache slots usage gets suspended)
Usable Cache for the Reads = 93 GB (370-277GB) (This is the Cache that can be used for the Host reads)
(481 -370) = 111 GB is used for Common Area + Device ID Tables. Am i correct with respect to the stats?
Max # of System Write Pending Slots : 4548336 = 277 GB
Max # of DA Write Pending Slots : 0
Max # of Device Write Pending Slots : 227416
Replication Cache Usage (Percent) : 0
SRDF/A Maximum Cache Usage (Percent) : 61
1) I was told that the Max System Write Pending slots (277 GB) are the ones that are used for Writes done by the Host.So when a write request comes to the Cache I know that it sits in the Cache Slot and is marked as WP (both Local and Remote WP).The SRDF switching happens for every 15 sec,So the data that is coming in will wait in the cache for 15 sec (cycle time ) and then will be switched to the R2 Cache and then the WP bit is cleared or the track is de-staged to the Disk and it no longer exists in the Cache but marked as Remote WP in the Device ID table ?
2) ) Out of the 277 GB Cache available for Writes we have SRDF/A maximum Cache Usage at 61% which is 169 GB.So whenever our SRDF/A activity reaches 61% of 277 GB we are experiencing a suspension. What is the other 39% of the Write Cache used for ? Apart from SRDF there is no other activity that is run on the system by us like Virtual LUN VP,DP,FAST.
3) Is there any separation like SRDF Cache and the Cache used for Host Writes? I know that whenever a write comes it is stored in the slot and marked as WP. So does this track a part of the SRDF Cache (61%) or the rest of the cache (39%) ?
I want to know the above to investigate the SRDF suspensions happening at 10:30 on my VMAX.
I have a situation here where my % Cache WP reaches 70-80 and multiple SRDF Groups suspends.All these I believed that the % Cache WP means the data that is in cache that is wriiten by the host but not yet de-staged to the Disks and i was investigating on the LUNS that are writing more during that period.But when we had a performance work shop i was told that Analyst that the % Cache WP consists of the Read Cache and the Write Cache.If that is the case i believe that the term % Cache WP doesn't make sense at all.
Now we are in a fix if the Reads also needs to be investigated..!!!! we have backup windows starting at 6:00 in the evening till 6:00 in the morning.We have Virtual Environment with 10-12 ESX Clusters in our environment and in some Clusters we have atleast 60 LUNs..
That will depend on the useable capacity configured in the system. Three is a cache calculator available to EMC folks that can give you an estimate based on the types of volumes and the capacity you are planning to install in the system.
It sounds like you need a performance analysis done to understand why your writes are not clearing faster. Could be a backend issue, or an issue with your links.
Thanks for the update Quincy...The performance analysis has been done on the VMAX for twice now.Everytime EMC does this they give some suggestions like make the RA ports Unidirectional,decrease the work load n etc....
We are looking at the balancing of the work load currently but am stuck at a point to decide on whether i should concentrate on reads or Writes to get this fixed..
If Cache is occupied by the reads then the title % Cache WP (WRITE PENDING) is not the right one as it mis guides af it is the due to the Host Writes.It can be % Cache FULL and that makes some sense.
To resolve something u need to understand first how it is working right ?
I thought that as an option but my fear was that my % Cache WP is going till 90 and at times even 98%.
So if i increase the Max SRDF Usable Cache does that mean it is taking the available Cache slots to be SRDF making it difiicult for the Host Reads and Writes.
Am i correct ?
Does restoring the value to the 74 have any other adverse effect on the rest of the Cache that is available for Read/Write Operations.?
Below shown is the Status of the RDF usable cache before and during the SRDF groups suspension issue.As the title says that
SRDF/A Session Cache Summary i believe that this section gives the information relating to the SRDF/A Cache usage and it getting occupied to 61% should mean that this amount of cache is used for replicating the Data which means that it is new write or update to an existing track on VMAX.
So this means that i am getting a lot of Writes..Am I correct?
I agree to the fact that 100% of the cache can be used for reads,but the SRDF cache shown below will be used only for holding the data that needs to be replicated to the other site.
So this getting filled means that the system is experiencing a lot of Write activity at the time correct ?
But the Performance analyst still argues that the system is experiencing the suspension due to Reads but not the Writes and says that Writes can wait in the Cache for some time as the reads will always be the priority.
In SRDF/A mode the data is temporarily held in the cache until the cycle switching,But how exactly does the Adaptive Copy transfer the data from the Primary system to the Secondary system.
When a SRDFgroup operates in the Adap Copy Disk Mode the writes are aknowledged as if they are writes to the Local Devices and then transferred at later point over the Links is what EMC document says.
So how exactly does the replication of a track takes place when operating the Acp_Disk Mode ?
Can this by pass the primary cache before getting replicated to the Secondary Cache ?
Quincy561
1.3K Posts
0
June 25th, 2013 06:00
Yes, it is the amount of cache that is written, but not yet destaged
Quincy561
1.3K Posts
1
June 25th, 2013 05:00
Every storage array that has write cache, has a maximum amount of cache that can be written to and not yet destaged to disks.
When you reach the limit, you cannot write another track without destaging something, which is why your write times increased.
Generally, you reach this limit because you are writing faster than the backend can destage.
You can speed up the destage by adding more drives, more DAs, or changing the protection of the devices.
smaiello
61 Posts
1
June 25th, 2013 06:00
Thank you Quincy, so essentially Cache WP is just a write cache plain and simple?
Fenglin1
4 Operator
•
2.1K Posts
0
June 25th, 2013 20:00
Boom shared a good table in thread Re: Symmetrix performance which provide detail description about Performance Metrix for your reference.
Metric
Description
Host IOs/sec
Number of host IO operations performed each second by all Symmetrix devices, including writes and random and sequential reads.
Host Reads/sec
Number of host read operations performed each second by all Symmetrix devices.
Host Writes/sec
Number of host write operations performed each second by all Symmetrix devices.
% Reads
Percent of total read IO operations performed each second by all of the Symmetrix devices.
% Writes
Percent of total write IO operations performed by all of the Symmetrix devices.
% Hit
Percent of IO operations performed by all of the Symmetrix devices, for which the read data was in cache and the write operation could be sent directly to cache without having to wait for data to be destaged from cache to the disks.
Host MBs/sec
The number of host MBs written and read by all of the Symmetrix devices each second.
Host MBs Read/sec
The number of host MBs read by all of the Symmetrix devices each second.
Host MBs Written/sec
The number of host MBs written by all of the Symmetrix devices each second.
FE Reqs/sec
Data transfer between the director and the cache. An IO may require multiple requests depending on IO size, alignment, or both. The requests rate should be either equal to or greater than the IO rate.
FE Read Reqs/sec
A read data transfer between the director and the cache. An IO may require multiple requests depending on IO size, alignment, or both. The requests rate should be either equal to or greater than the IO rate.
FE Write Reqs/sec
A write data transfer between the director and the cache. An IO may require multiple requests depending on IO size, alignment, or both. The requests rate should be either equal to or greater than the IO rate.
BE IOs/sec
Total IO from all BE directors to the disks per second.
BE Reqs/sec
A data transfer of a read or write between the cache and the director.
BE Read Reqs/sec
A data transfer of a read between the cache and the director.
BE Write Reqs/sec
A data transfer of a write between the cache and the director.
System WP Events/sec
Number of times each second that write activity was heavy enough to use up the system limit set for write tracks occupying cache. When the limit is reached, writes are deferred until data in cache is written to disk.
Device WP Events/sec
Number of times each second that the write-pending limit for a specific Symmetrix device was reached.
WP Count
Number of system cache slots that are write pending.
System Max WP limit
The percent of the target % at which writes are delayed. The range is from 40% to 80%.
% Cache WP
Percent of system cache that is write pending.
Avg Fall Thru Time
Average time it takes a cache slot in LRU0 to be freed up. It is the average time from the first use of the contents to its reuse by another address.
FE Hit Req/sec
Total requests from all FE directors per second that were satisfied from cache.
FE Read Hit Reqs/sec
Total read requests from all FE directors per second that were satisfied from cache.
FE Write Hit Reqs/sec
Total write requests from all FE directors per second that were satisfied from cache.
Prefetched Tracks/sec
Tracks per second pre-fetched from disk to cache upon detection of a sequential read stream.
Destaged Tracks/sec
Tracks per second saved into disks.
FE Read Miss Reqs/sec
Total read requests from all FE directors per second that were misses. A miss occurs when the requested read data is not found in cache.
FE Write Miss Reqs/sec
Total write requests from all FE directors per second that were misses. A miss occurs when the write operation had to wait while data was destaged from cache to the disks.
cincystorage
2 Intern
•
467 Posts
0
June 25th, 2013 20:00
Just wanted to expand on this a little bit.. You've got a "Write Pending Count" which is the number of cache slots containing Writes which are waiting to be de-staged to disk.
Next you've got a "Max Write Pending Threshold" which is the number the write pending count much reach before the write activity for the device increases in priority and, at max, Write Misses start to happen. It's not a static number at all, it's dependent on the activity on the Symmetrix. Each device is assigned a limit of write pending slots that can dynamically change between a base value and a value three times the base. Once the max write pending threshold has reached three times the base value, writes to the device are delayed so that the cache can destage slots for that device, which frees the cache slots. As cache slots are freed, the writes begin again. While the Write Pending Limit is reached, disk directors operate in a priority destage write mode, giving write data higher priority than usual (write misses).
The "% WP Limit" representation of these two statistics..
Quincy561
1.3K Posts
0
June 25th, 2013 22:00
Mark, the device WP limit hasn't been dynamic since DMX3. It is 5% of writable cache, or the size of the device if the device is smaller than 5%.
The DAs go in to high priority mode when we get above 50% of the WP limits.
cincystorage
2 Intern
•
467 Posts
0
June 26th, 2013 02:00
Quincy, you know for some reason I thought we were talking about a DMX3... Guess that happens then you are sleepy and not paying attention... Thanks for pointing that out..
Nagraj_K
10 Posts
0
September 24th, 2013 02:00
Hi Quincy,
Have a small query on the % Cache WP.
When we say that 50 % Cache WP is reached,does this mean that out of the Total available Cache (Read+Write) the Cache available for Writes has reached 50% of the available or is it the sum of the read cache+ Write Cache that is 50% of the total available Cache?
Please go through the below stats which were taken from my VMAX.
Total Cache configured in VMAX is 1024 GB. (128*8)(Mirrored)
Total Cache mirrored = 481 GB (512-481) ~ 32 GB (2 GB taken from every board by the Enginuity for its internal use *** as per an update from EMC in their Documentation))
Usable cache = 370 GB (6064448 * 64 KB) (This is the cache that can be used for host reads & writes)
Max WP cache = 277 GB (4548336 * 64 KB) (This is the amount of Cache that can be used for Writes and beyond this the host write experiences high response times)
Max cache used for SRDF = 61% (277 GB) = 169 GB (Reaching this limit the Group with the more Cache slots usage gets suspended)
Usable Cache for the Reads = 93 GB (370-277GB) (This is the Cache that can be used for the Host reads)
(481 -370) = 111 GB is used for Common Area + Device ID Tables. Am i correct with respect to the stats?
===================================================================================================
Stats reg the Cache:
System Write Pending Limit : 4548336 (277.61 GB)
Cache Slots available for all SRDF/A sessions : 2774484 (169.34 GB)
Total Local Write Pending Count : 253976
Total System Write Pending Count : 475473
Total System SRDF/A DSE Used Tracks : 0 (0.00 GB)
Total System Cache Slots In Use : 391352 (23.89 GB)
Total System Cache Slots Shared : 0 (0.00 GB)
Total System Cache Full Percentage : 8.0 = (Used Cache Slots/Total Cache Slots)*100 = (391352/4548336)*100 = 8
======================================================================================================
Cache Size (Mirrored) : 481280 (MB) = 470 GB
# of Available Cache Slots : 6064448 = 370 GB
Max # of System Write Pending Slots : 4548336 = 277 GB
Max # of DA Write Pending Slots : 0
Max # of Device Write Pending Slots : 227416
Replication Cache Usage (Percent) : 0
SRDF/A Maximum Cache Usage (Percent) : 61
1) I was told that the Max System Write Pending slots (277 GB) are the ones that are used for Writes done by the Host.So when a write request comes to the Cache I know that it sits in the Cache Slot and is marked as WP (both Local and Remote WP).The SRDF switching happens for every 15 sec,So the data that is coming in will wait in the cache for 15 sec (cycle time ) and then will be switched to the R2 Cache and then the WP bit is cleared or the track is de-staged to the Disk and it no longer exists in the Cache but marked as Remote WP in the Device ID table ?
2) ) Out of the 277 GB Cache available for Writes we have SRDF/A maximum Cache Usage at 61% which is 169 GB.So whenever our SRDF/A activity reaches 61% of 277 GB we are experiencing a suspension.
What is the other 39% of the Write Cache used for ? Apart from SRDF there is no other activity that is run on the system by us like Virtual LUN VP,DP,FAST.
3) Is there any separation like SRDF Cache and the Cache used for Host Writes? I know that whenever a write comes it is stored in the slot and marked as WP. So does this track a part of the SRDF Cache (61%) or the rest of the cache (39%) ?
I want to know the above to investigate the SRDF suspensions happening at 10:30 on my VMAX.
I have a situation here where my % Cache WP reaches 70-80 and multiple SRDF Groups suspends.All these I believed that the % Cache WP means the data that is in cache that is wriiten by the host but not yet de-staged to the Disks and i was investigating on the LUNS that are writing more during that period.But when we had a performance work shop i was told that Analyst that the % Cache WP consists of the Read Cache and the Write Cache.If that is the case i believe that the term % Cache WP doesn't make sense at all.
Now we are in a fix if the Reads also needs to be investigated..!!!! we have backup windows starting at 6:00 in the evening till 6:00 in the morning.We have Virtual Environment with 10-12 ESX Clusters in our environment and in some Clusters we have atleast 60 LUNs..
Jus need some clarity on the above terms
Thanks
Nagaraju K
Quincy561
1.3K Posts
0
September 24th, 2013 06:00
One correction.
100% of user cache can be used for reads. It isn't total cache less the limit for write data.
Nagraj_K
10 Posts
0
September 24th, 2013 08:00
if i have configured 1 TB of Cache on the VMAX what is the actual amount of the Usable cache i get ?
Quincy561
1.3K Posts
0
September 24th, 2013 08:00
That will depend on the useable capacity configured in the system. Three is a cache calculator available to EMC folks that can give you an estimate based on the types of volumes and the capacity you are planning to install in the system.
It sounds like you need a performance analysis done to understand why your writes are not clearing faster. Could be a backend issue, or an issue with your links.
Nagraj_K
10 Posts
0
September 24th, 2013 09:00
Thanks for the update Quincy...The performance analysis has been done on the VMAX for twice now.Everytime EMC does this they give some suggestions like make the RA ports Unidirectional,decrease the work load n etc....
We are looking at the balancing of the work load currently but am stuck at a point to decide on whether i should concentrate on reads or Writes to get this fixed..
If Cache is occupied by the reads then the title % Cache WP (WRITE PENDING) is not the right one as it mis guides af it is the due to the Host Writes.It can be % Cache FULL and that makes some sense.
To resolve something u need to understand first how it is working right ?
Can you please answer the 1st and 2nd Questions.
Nagraj_K
10 Posts
0
September 24th, 2013 14:00
Thanks for the reply Richotte..
I thought that as an option but my fear was that my % Cache WP is going till 90 and at times even 98%.
So if i increase the Max SRDF Usable Cache does that mean it is taking the available Cache slots to be SRDF making it difiicult for the Host Reads and Writes.
Am i correct ?
Does restoring the value to the 74 have any other adverse effect on the rest of the Cache that is available for Read/Write Operations.?
Nagraj_K
10 Posts
0
September 25th, 2013 01:00
Below shown is the Status of the RDF usable cache before and during the SRDF groups suspension issue.As the title says that
SRDF/A Session Cache Summary i believe that this section gives the information relating to the SRDF/A Cache usage and it getting occupied to 61% should mean that this amount of cache is used for replicating the Data which means that it is new write or update to an existing track on VMAX.
So this means that i am getting a lot of Writes..Am I correct?
I agree to the fact that 100% of the cache can be used for reads,but the SRDF cache shown below will be used only for holding the data that needs to be replicated to the other site.
So this getting filled means that the system is experiencing a lot of Write activity at the time correct ?
But the Performance analyst still argues that the system is experiencing the suspension due to Reads but not the Writes and says that Writes can wait in the Cache for some time as the reads will always be the priority.
In SRDF/A mode the data is temporarily held in the cache until the cycle switching,But how exactly does the Adaptive Copy transfer the data from the Primary system to the Secondary system.
When a SRDFgroup operates in the Adap Copy Disk Mode the writes are aknowledged as if they are writes to the Local Devices and then transferred at later point over the Links is what EMC document says.
So how exactly does the replication of a track takes place when operating the Acp_Disk Mode ?
Can this by pass the primary cache before getting replicated to the Secondary Cache ?
Nagraj_K
10 Posts
0
September 25th, 2013 15:00
Thanks Richotte for your patience in answering all my questions and i really appreciate your help and thanks for the information you have shared.
Regards
Nagaraju K