we have had these discussions a couple of times but i can't find what i am looking for. When one is looking at analyzing MetaLUN performance, do we rely on numbers displayed for MetaLUN or do we have to look at component LUNs. For example: I have a striped MetaLUN the consists of two component LUNs. Each component LUN resides on its own raid group. There are no other LUNs on those raid groups. When i look at MetaLUN service time it's showing .250 ms but when i look at individual physical drives that comprise my component LUNs ..they are showing 4 ms. What am i missing ?
This question is difficult to answer without knowing the details of the workload. Although, a nar file analysis by an expert would quickly explain it.
It is possible that the difference is the effect of caching. Its also possible that the distribution of I/Os to the two component LUNs is not the same.
Look at an I/O size histogram for each of the component LUN, and cache utilization for clues.
I just checked I/O size for each component LUN and this is what i see:
1) write I/O size for each component lun is identical 8 KB
2) read I/O size for each component lun is different. Meta head is showing 16 KB I/O size while Meta tail shows 8 KB ( can you explain this ? )
Looking at cache metrics ( not sure which ones i need to look at specifically)
1) Read Cache hits/s shows twice as many hits for Meta head component vs meta tail component
Every other metrics for component luns shows identical stats. Is there anything specific i should look for ?
The metaLUN head will ordinarily contain the 1st 1MB of the LUN mapped to the server. (I assume you've got a Window's volume on the metaLUN.) So, Component 0 would have the file allocation table, boot blocks, file system metadata (maybe), journaling data, etc... It will get more I/O from the host if the host is updating FAT or some metadata for that file system. That extra 8KB (I again assume you have a 8KB random read/write workload?) has nothing to do with the metaLUN. So, the extra 8KB is the higher I/O generated from a host to the LBA region at the start of the volume.
BTW, this is why Best Practices recommends that if you have more than one metaLUN using the same RAID group, you should offset the start of each metaLUN to a different RAID group in round robin fashion. I believe this is explained in the EMC CLARiiON MetaLUNs paper, available on Powerlink.
Thank you John. This MetaLUN is presented to a Linux box (Citrix XenDesktop cache area). XenDesktop places VM's page file, temp files on this LUN so i expect a lot of random read/write type of access. I have two raid group dedicated entirely to this striped MetaLUN. Again my confusion is why do i see service times of 4ms for drives compising component LUNs while MetaLUN itself is only showing .25ms response time.
Another thing occured to me.
What version of FLARE are you on? The measurements of response times differ depending on the FLARE version.
The measurement of response time on 28.0 is an approximation based on the Average Busy Queue Length. In with a light I/O load, the approximation of response time will be less accurate at the component LUNs than at the metaLUN. The metaLUN metric is 'more accurate' because it 'sees' more I/Os. The I/Os of all the component LUNs contribute to its metric improving the accuracy of the measurement. Accuracy of the response time measurement improves as the load increases.
Note that the measurement of the component LUNs response time would also be effected if they were sharing their underlying RAID groups with another metaLUN or a regular LUN.
The measurement on 29.0 and later is an actual measurement of response time.
I'm fairly certain this is your issue.
i am on flare 28.5.707. There is significant load on the MetaLUN ( ~2000 iops). I am looking at service time metrics ...still not sure why MetaLUN service times are .25 ms and physical disk 4 ms. I am not looking at component metrics at all.
I believe the answer is caching. Seeing we are averaging the estimated response time for each IO that hits the Metalun as a whole, write cache hits and read cache hits can bring down the average response time for the lun. If there are a lot of re-hits, then this also brings down the response time and service time averages. Really the backend IO probably does not represent what the Metalun as a whole is doing. Is there low IO to each drive? This could be the averaging "issue" that was previously mentioned.
each drive is showing around 57 IOPs (10k drives), each component LUN is created out of 6+1R5, so 14 drives * 57 = 798 IOPS. Write cache re-hits are around 1000, so is it correct to assume that MetaLUN is getting very low service times because it keeps re-reading the same stuff from cache ? It was read once from disk, service time is 4 ms at the disk level ..but now that it's in cache XenDesktop keeps re-reading/writing the same data over and over ? The load test that Citrix is generating is they are trying to power up 500 VMs at once.
dynamox - "is it correct to assume that MetaLUN is getting very low service times because it keeps re-reading the same stuff from cache ?"
Yes, the re-hits bring down the average response time and service times, whether reads or writes. Most likely you are missing read cache at times and you are seeing those 4ms response times, but the quick cache hits are bringing down the averages for what the Metalun shows.