Start a Conversation

This post is more than 5 years old

Solved!

Go to Solution

4490

November 4th, 2011 07:00

Analyzer cx4 - good or bad? having constant dirty pages % from 60 to 85

any one can explain how can understand this info, easy way. whats dirty pages %,thank you.

474 Posts

November 4th, 2011 08:00

Dirty pages are pages of cache data that has been written by a host to array cache, but has not yet been committed to disk.

“Good” dirty pages % depends on the low/high watermarks for the most part. So if your watermarks are set to 60/80, then dirty pages between 60ish and 80ish are expected. The cache engine maintains as many dirty pages as possible within watermarks in order to increase cache hit ratios. When dirty pages climbs over the high watermark, the cache engine begins “high watermark flushing” to start moving the dirty pages out to disk. While flushing, as the dirty pages % drops down to the low watermark, flushing stops. As long as dirty pages is essentially between the two watermarks, then the system is operating normally.

If host write throughput exceeds the capability of cache to flush data to disk, then dirty pages will begin to climb over the high watermark until the problem subsides. If dirty pages hits 100%, this means that all data in cache needs to be committed to disk and there is no room in cache to accept new writes until some amount of data is flushed. This will cause “forced flushing” which will impact performance for not only the active host, but other hosts sharing the array.

I wouldn’t worry about 5% over the high watermark, but if you are seeing 99% regularly then you could be experiencing forced flushing which is bad.

Hope that helps!

Richard J Anderson

295 Posts

November 4th, 2011 14:00

exelent explanation. One thing else, how about "SP cache idle flush on" is showing me values of 3 days  from 16 to 50

and "SP cache flush radio" values are of 3 days between 0 and 2. Any idea.

One more question why from analyzer, accessing luns value is "0" all 3 days for the followings mesures "SP Cache Read Hits/s",  "SP Cache Read Misses/s",   "SP Cache Read Hit Radio",   "SP Cache Write Misses/s", "SP Cache Forced Flushes/s" and many more. My cache is working and enabled but what happend.

thanks.

259 Posts

November 7th, 2011 04:00

what do you do to correct high dirty page symptoms?

295 Posts

November 7th, 2011 14:00

i have check water marks, and looks like this the settings: high 85 and low 55, so i think that is normal due all 3 days in working hours the dirty pages were inside this theshold.

another questions:

what does it means

"sp cache read misses/s"

"sp cache read hit radio"

"disk crossing/s and %"

where can i find or calculate iops in analyzer.

474 Posts

November 7th, 2011 15:00

Are you experiencing high response time / latency within the cluster hosts? What is the application running on the hosts?

In Analyzer, have you looked at Response Time for the SPs? What about the LUNs specifically used by the problem hosts?

Richard J Anderson

474 Posts

November 7th, 2011 15:00

Yep, sounds normal…

As far as your other questions, there are some decent descriptions in the Navisphere/Unisphere online help (the online help is very extensive). Briefly…

SP Cache Read misses/s is the number of IO’s per second that were read by a host, and the data was NOT in the read cache. So the data would have been read from the physical disks. Unless you have highly sequential read activity (which activates pre-fetching) or highly localized reads, you will see quite a few of these.

SP Cache Read Hit Ratio is the ratio of the number of times a host read data that WAS in cache, compared to the total number of times the host read data. So if a host issued 10 reads, and 5 of the reads were for data that was in the read cache, the ratio would be 0.5 (ie: 50%)

Disk Crossings are caused when a single Host IO crosses two element boundaries and results in more than 1 IO to the disk. Clariion element size is 64KB so an example would be when a host issues a 128KB IO, this would require reading/writing against two independent physical drives. If the IO is 64KB or smaller, it should only need to access a single physical disk. In general, if IO sizes are larger than 64KB you will see lots of these, if they are predominately smaller than 64KB you should see fewer. Be sure that your host partitions are properly aligned to the 64KB offset to prevent unnecessary disk crossings.

Disk Crossings/s is the number of IO’s that required accessing more than 1 physical disk in a second

Disk Crossings % is the % of total IO’s that caused disk crossings.

Assuming your host partitions are all properly aligned, any disk crossings you see are not an issue and there is nothing you can really do about them. Unaligned partitions may cause additional disk crossings and that means essentially more load on the array which drives up disk, bus, and CPU utilization of the array.

As far as the other question of how to reduce Dirty Pages% if it is high… Dirty pages % climbing up to 99% is generally caused by a host writing data to the array faster than the physical disks for that LUN/Host can sustain. If this happens for a short period of time, the cache buffers the burst of writes and there is no performance issue. If the burst lasts longer, the cache must flushed to make room for more writes and the disks must then keep up with the speed of the flushing. If the disks are unable to keep up, and the write activity is still occurring, then the cache dirty pages will climb and eventually hit 99 or 100%. The only way to permanently solve the problem is to move the LUN to a faster set of disks (more disks in RAID group, higher RPM, RAID10 vs. RAID5, etc). You may be able to alleviate the problem slightly by lowering the watermarks which creates a larger buffer in cache but that just delays the problem a bit. In some rare cases you could configure specific LUNs to bypass cache but there are many considerations before doing that. As long as dirty pages is within the watermarks, don’t do anything. If you are seeing issues then consult EMC for more help.

In Analyzer, IOPS is represented as “Throughput”. You can look at Total, Read, or Write Throughput (IOPS) for the storage processors, disks, LUNs, RAID Groups, etc depending on what you are trying to figure out.

Hope that helps…

Richard J Anderson

295 Posts

November 7th, 2011 15:00

Thanks Richard,

this is the point

i don´t find any evidences of some botlenecks in this cx4 storage, but some hosts of many is experience some performance issue. What i really need is something to justify and convence to implement FAST VirtualProvisioning.with 3 tiers in order to solve this issue.This hosts, are the most important 2 cluster hosts in this storages.

295 Posts

November 7th, 2011 16:00

aix with oracle rac, response time for the 2 SP were for these 3 days. they both keeps between 5 and 9 ms time response.

What i notice is that reads are more than writes, and the cache are set like this.

read cache spA - 250 megas

read cache spB - 250 megas

Write cache 1000 megas.

response time for specific meta luns, keeps between 20ms average, in working hour, with 25% average of utilization.

474 Posts

November 7th, 2011 22:00

Is the Oracle RAC using ASM filesystem across multiple LUNs? How are the LUNs laid out on the backend storage array?

Has the Oracle/AIX Admin done performance monitoring on the hosts that clearly points at the storage performance? It could be possible that the performance issue is CPU related.

Richard J Anderson

No Events found!

Top