Unsolved

This post is more than 5 years old

11 Legend

 • 

20.4K Posts

 • 

87.4K Points

3046

October 19th, 2010 13:00

Write times and Write pending correlation

Hello guys,

i am trying to make sense out of this situation. Why during the time when i have over 20000 write pendings (9:15am), average write times is good (less than 20ms), yet when write pendings are very low (around 10:15am), i have almost 90ms write times. Write pending threshold is around 55000 per devices (5% of total system write pending slots). This is 100% write workload.  Any clues ?

10-19-2010 4-34-56 PM.png

1.3K Posts

October 19th, 2010 14:00

When the backend can't keep up with the frontend, the WP counts get high.

If you are seeing low WP counts along with poor write response times, I would suspect a frontend bottleneck.

131 Posts

October 19th, 2010 14:00

it could be possible..

You are flushing your cache and you have low WP so your backend is good.

but you have High Write response time so your Frontends are not pushing enough data in cache or backend.

11 Legend

 • 

20.4K Posts

 • 

87.4K Points

October 19th, 2010 14:00

If you are seeing low WP counts along with poor write response times, I would suspect a frontend bottleneck.

i don't think i am following you. If i have very low wp counts why would i have very high write times in symmetrix (as seen in that chart)  ?

1.3K Posts

October 19th, 2010 15:00

I was suggesting that the writes can't get into the system fast enough because you don't have enough FA resources.

11 Legend

 • 

20.4K Posts

 • 

87.4K Points

October 20th, 2010 07:00

Quincy,

i could be looking at it all wrong, please correct me. When i look at chart with "Sampled average write times", this is from FA perspective ..how long it took for it to send request to the system and get it back. So if write pendings are low, there is nothing sitting in cache ..why would FA wait for so long.

Thank you

1.3K Posts

October 20th, 2010 08:00

I'm suggesting that you are writing to the FA faster than it can deal with the IOs, so it has to queue them.  I would suggest adding FA CPUs (not just ports) and see if the write response time gets better.

11 Legend

 • 

20.4K Posts

 • 

87.4K Points

October 20th, 2010 09:00

this device is mapped to four FAs ..you tell me ..but they don't seem to be busy to me (these are CPU stats).

10-20-2010 12-36-46 PM.png

11 Legend

 • 

20.4K Posts

 • 

87.4K Points

October 20th, 2010 10:00

could be, but if you look at "sample average write time (ms)" ..it stays very high for almost an hour.  These devices are configured for SRDF/S, what metrics can i look in Performance Manager to see if SRDF could have anything to do with that ?

1.3K Posts

October 20th, 2010 10:00

No, they don't look busy over that sample interval, but what about during a one or two second sample?

We have seen many cases where writes burst for just a very short period of time, then stop.  During that short period of time they can overwhelm the front end causing high response times.

It still could be something else such as a SAN problem or slot contention.

11 Legend

 • 

20.4K Posts

 • 

87.4K Points

October 20th, 2010 11:00

we have a case opened but PSE kicked it back to local support and it takes local guys forever to setup STP and all that stuff. I am paying boat load of money for ECC/Performance Manager so trying to research myself as well. Thanks for suggestions so far.

1.3K Posts

October 20th, 2010 11:00

Then it could be slot locks or something else.  One quick way to determine if RDF is causing the issue is to suspend the RDF for a breif period and see if the response times improve.

I would also suggest opening a case with CS if this is really an issue for you, rather than trying to troubleshoot it here.

1.3K Posts

October 20th, 2010 13:00

Any host with the API/CLI installed can start and collect STP data via the stordaemon.  You can set it up to collect up to 1 minute samples.

Also you might want to run a symstat -i 5 and see if it shows any bursting of IO.

And again if you can suspend RDF for a short time to see if performance improves greatly, that would rule out the FA.

11 Legend

 • 

20.4K Posts

 • 

87.4K Points

October 20th, 2010 13:00

we have put SRDF in adaptive copy mode, response times have improved but still higher than usual. We might suspend it for a little while and see what happens. In terms of IOPS this host is doing the same thing that it did a week ago (according to performance manager), the only thing that changed in the environment we started replicating from a new VMAX to DMX3 (thin LUNs). This DMX3 is also a SRDF target for this application that resides on DMX4 (regular thick LUN replication).

11 Legend

 • 

20.4K Posts

 • 

87.4K Points

October 20th, 2010 14:00

if target SRDF box is having problems distaging from cache to disk, what items would indicate that ?

1.3K Posts

October 20th, 2010 14:00

Putting RDF into adaptive copy mode should have removed the RDF impact to the writes. 

0 events found

No Events found!

Top