This post is more than 5 years old

53 Posts

1493

November 22nd, 2011 06:00

Trouble with SRDF/A filling up cache. . .

All:

I'm testing SRDF/A between two sites 18ms apart.  The device I'm testing on is a thin meta with 4 members.  The performance seen on that device is terrible.  I see read and write latency in the 80+ ms range.  

My implementation engineer told me that I should try striped metas, as opposed to concatenated b/c they'd have more cache, and more spindles to work with.  I've tested with both striped, and concatenated, and I cannot see a significant difference in speed between them.  I am new to Symmetrix, and I've taken the Config Management course, but I'm not very familiar with how the cache works.  My engineer tells me that each device gets a set amount of cache, and SRDF/A is filling that up b/c it's waiting to send, which hurts performance on that LUN.  If that was true, then the striped meta should theoretically show much better performance than a concatenated, simply b/c it's using the cache for all the devices on a write, versus just the head.  But again, I don't see much difference in speed either way. 

So my question is, does Symmetrix really only dedicate a single, fixed slice of cache to each device?  When that's full, it can't borrow cache from somewhere else? 

There's a setting in there for Dynamic Cache Partitioning.  The approximate number of documents on Powerlink referencing this is 0.  What is Dynamic Cache Partitioning?  Will it help my situation by providing a larger pool of cache to the SRDF/A devices?  What are the recommended settings for this?

By the way, this is a new implementation, and nothing else was going on during my tests.  I had a single ESXi 5 host running 2x 4Gbps FC links into 2 FE directors for this test. 

Thanks for the information.

Brandon

1.3K Posts

November 22nd, 2011 07:00

There is NOT a fixed amount of cache allocated per device.  There is a maximum WP count per device, but nothing is allocated.  One device can consume all of read cache if that was the only active device.  Striped metas can give better performance but it has less to do with cache and more to do with the concurrency. 

Without any performance data to go on, my guess is you are limited by your FA (front-end) if the WP counts are not high.

1.3K Posts

November 22nd, 2011 07:00

If you can post the TTP/BTP files on the EMC ftp site, I can take a look.  I would not mess with DCP.

I would look at the queue depth on the front end to start. 

53 Posts

November 22nd, 2011 07:00

I'm so glad to hear you say that.  I was appalled to hear this guy's description of how cache was allocated on VMAX.  But he's the EMC guy.  I assumed he knew what he was talking about.  Is there any need to mess with Dynamic Cache Partitioning? 

I have performance data, but I only looked at read / write latency, and % Cache WP.  What else should I look at to narrow down what's going on?  I had no idea a single HP blade could cause this kind of latency doing a snapshot and a storage vMotion with VAAI. 

Thanks for your response.

2 Intern

 • 

1.3K Posts

December 5th, 2011 12:00

when you say "less to do with cache" , but that still matters; typically WP limit on each device is 5% of cache.so 20 devices reaching WP limit will result in reaching system WP limits. so more devices mean more total cache slots avaialable if needed. I do agree that additional device will increase the concurrency as well. Just making an argument thinking what was on the other person's head when he said that

6 Operator

 • 

2.1K Posts

December 11th, 2011 17:00

The part of this that has me scratching my head is where SRDF/A is supposedly heavily impacting performance on the source meta. The absolute first thing I would be doing is turning off SRDF/A and seeing if the performance improved from the host perspective. Unless I'm completely losing it (and I don't totally rule out temporary insanity) the WP limit is only going to affect the SRDF/A session dropping out of asynchronous consistency. That shouldn't affect the source meta significantly.

I think there must be something more going on here and I would be curious to hear the results of a closer look at this.

53 Posts

December 12th, 2011 06:00

Allen:

There was PLENTY going on here. 

Our VMAX wasn't setup optimally from the start.  I think the guy who set it up was used to setting up old DMX, and came up with some TDAT sizes that were bad for concurrency.  We got up with some really great people from the SPEED team, and they gave us a clue on how to set it up properly.

After we got it rebuilt using real VMAX recommended performance-based numbers, the thing is absolutely smoking!  Unfortunately, I haven't had time to test the replication yet.  I have since taken the class on SRDF, so I know quite a bit more about how to keep an eye on things than I did during my first test.  Once I get time, I'll report back here.

Thanks!

Brandon

1.3K Posts

December 12th, 2011 06:00

Glad to hear things are working well now.

6 Operator

 • 

2.1K Posts

December 13th, 2011 20:00

That makes a lot more sense Brandon. Glad to hear that it got cleared up too.

No Events found!

Top