Welcome to the EMC Support Community Ask the Expert conversation. This is an opportunity to learn about Performance calculations on the Clariion /VNX systems and the various considerations that must be taken into account
This discussion begins on Monday, August 13th. Get ready by bookmarking this page or signing up for email notifications.
Rob Koper is working in the IT industry since 1994 and since 2004 working for Open Line Consultancy. He started with Clariion CX300 and DMX-2 and worked with all newer arrays ever since, up to current technologies like VNX 5700 and the larger DMX-4 and VMAX 20k systems. He's mainly involved in managing and migrating data to storage arrays over large Cisco and Brocade SANs that span multiple sites widely spread through the Netherlands. Since 2007 he's an active member on ECN and the Support Forums and he currently holds Proven Professional certifications like Implementation Engineer for VNX, Clariion (expert) and Symmetrix as well as Technology Architect for Clariion and Symmetrix.
Jon Klaus has been working at Open Line since 2008 as a project consultant on various storage and server virtualization projects. To prepare for these projects, an intensive one year barrage of courses on CLARiiON and Celerra has yielded him the EMCTAe and EMCIEe certifications on CLARiiON and EMCIE + EMCTA status on Celerra.
Currently Jon is contracted by a large multinational and part of a team that is responsible for running and maintaining several (EMC) storage and backup systems throughout Europe. Amongst his day-to-day activities are: performance troubleshooting, storage migrations and designing a new architecture for the Europe storage and backup environment.
This event ran from the 13th until the 31st of August .
Here is a summary document og the higlights of that discussion as set out by the experts. Ask The Expert: Performance Calculations on Clariion/VNX wrap up
The discussion itself follows below.
This discussion is now open for questions to Rob and Jon. We look forward to a lively, interesting and hopefully fun discussion over the course of the event.
We are using EMC clarion 240. I want to check throughput while my backups running and we don't have any analyzer installed in my box. Is is possible to achieve in command line?
You can set up VNX monitoring and reporting under a trial license,
This application works with CX4...
Very good tools and easier to use than Analýser.
Another item you can check to get performance related data is the SAN ports! If you're using Brocade as your SAN, run the free tool "SAN Health" during the backup and make sure you have the tool gather performance data during the backup hours. For Cisco you might need Fabric Manager (or DCNM these days) to get performance data.
Welcome everybody! We don't pretend to know it all, but we like to think that we understand how things work. Feel free to comment if you think something needs to be added to our story and ask questions if you wonder how to do the math.
To start the discussion I would like to open with "performance, what exactly is it ?"
Performance is how users / admins perceive responsiveness of applications. One person might think everything is working just fine while another person thinks it's very bad. Can we say that I/O response times of 30 ms are bad? No we can't. "It depends" is a popular answer EMC often uses and they are right: it depends!
Storage performance in the end always comes down to the physical storage medium, whether it is a rotating disk or an EFD (SSD). If your storage array is heavily used the relative performance improvement by using the cache tends to get less and less. What I mean is that if you have a single LUN of 1 GB the cache is only used for that single LUN and performance is great, but when you have 200 TB all sliced up in LUNs and you're using them all the same amount of cache is needed for all of that, so the FIFO (first in, first out) will delete the oldest data from the cache first and with that amount of cached LUNs this might not even be that old data at all which you will notice when you need to access that "old" data again!
I always do my calculations by adding up the physical storage devices. The cache improves the outcome, which is always nice, but at least I'm being cautious and I'm on the safe side.
The magic word in performance discussions is "IOps". 1 IOps is 1 "input or output operation per second". A rule of thumb EMC uses is that a single disk rotating at 15k RPM can handle 180 IOps, a 10k disk can handle about 140 IOps, a 7200 RPM disk can do 80 and a power efficient 5400 RPM drive can only do 40 IOps. An EFD (flash drive or SSD) can handle 2500 IOps. As said, this is a rule of thumb based on small random (4kB) blocks. When the blocksize increases the number of IOps will go down, but the amount of MBps goes up. When the blocksize decreases the amount of IOps goes up and the amount of MBps goes down.
When you combine multiple disks in a raid group (RG) or a pool you simply multiply the numbers by the amount of disks.
But the issue isn't what your array can deliver at it's best, but what your applications need!! And most of the time the administrators don't know what they need . "I need 1TB and it needs to be fast!" That sounds familiar, doesn't it? But how fast is "fast"? Asking them how many IOps they need is almost always answered by some weird look on their face as if you suddenly speak some foreign language. On a Windows host you can do some measurement using the tool "perfmon". You need to find the number of read and write operations per second and you should check the I/O size as well. If the I/O size is around 4kB you can use the rule of thumb provided by EMC, if the block size is larger you need to use lower numbers. The ratio between read and write IOps is important too, since 1 write I/O will trigger more I/Os on the storage back end, depending on the raid level used for that particular LUN. This handicap for write I/Os is called the "Write Penalty" (WP).
The Write Penalty is the mechanism what makes write I/Os slower than reads. The random write WPs for the following raid levels are:
For sequential writes the WP is mostly less than what I mentioned earlier, since a whole stripe needs to be written and the parity will change only once. For a 4RAID5 RG (4+1) the WP for sequential write I/O is 1.25 (25% overhead because of the 1 extra disk on 4 data disks). In a 8RAID5 RG the WP is only 1.125 (12.5% overhead). In RAID10 the WP is still 2, since there is no parity, but true mirroring.
If a host has 90% reads and 10% writes only 10% of the I/Os will suffer from the WP, but if the host has only 10% reads and 90% writes, 90% of the I/Os will suffer from the WP. Choose your RAID level carefully, since choosing the wrong one can be desastrous for your performance!
Thank you for the elaborated post Rob.
I agree with you when you say, "Storage performance in the end always comes down to the physical storage medium"
Considering that in mind, If you could explain the importance of RTO and RPO while as a customer/Implementation Specialist, What do you keep in mind at planning stage?
Peformance comes with the cost! However, having said that it also on Storage Admins to make the right choice depending on the storage system.
I always recommend following configuration depending on the customer's requirement:
Performance: RAID 1 + 0 (Expensive compared to RAID 5 but gives better performance)
Capcaity + Performance: RAID 5.
Moreover, if you could share your experience for EMC CLARiiON Storage Systems for Best Performance Practices in Industry for different types of Servers/Applications. (MS Exchange, Database Server etc.)
Ankit, RTO and RPO are usually used in backup scenarios. I can imagine that during restores (RTO) the storage is hammered hard because of all the data coming back from a restore and you might take that into consideration while designing your storage config, but I've never come accross such a scenario. Backing up however does cost performance on a regular bases, but then again: the research you did before creating the design should have revealed that and you should have designed your storage to be able to handle the extra I/O during backups. If you want backups to go faster, you should make sure that you're looking at the right bottleneck. Using an old LTO-1 tape drive will not help you getting anywhere near to backuping up at 750GB per hour. Look at all components if you want to create a new design (and you have the time for it). If time is the issue, make sure you make the right assumptions.
You could take into account that you want to be able to handle peak I/O without delays, but you can also consider to use a lighter config and you agree to have peak I/O times to flatline on 100% utilization for a bit longer if that saves you money. But be aware that doing so, production can feel the impact and it might even be undesirable to take a chance there.
Your advice about RAID10 for performance and RAID5 for Capacity + Performance is a rule of thumb. You should always investigate what you need. RAID5 isn't nessesarily bad for performance. If you have anough drives in a RAID5 configuration, this can even outperform a RAID10 config; simply do the math and see for yourself! Jon will publish a new post shortly and he will explain about how to do the calculation. If you have questions, ask! We can give examples if you want.
Thirdly: I can't say you need RAID10 for Exchange and RAID5 for a file server. Look at the previous explanation. Investigate before making decisions!
According to all this, I can leave the idea that Log disks of Database Servers have to be placed on a RAID10 LUN?
At this moment, the policy at our company it to use RAID10 for Production Databases (High performance) and for testing, deployment and acceptance databases we are using 4RAID5 for higher storage utilization and therefore saving some money.
As a rule of thumb it's an understandable rule, but if you really want to be sure, you should calculate what you need!
Suppose you have a Database server that has 100% writes on the log disk and 85R/15W on the database drive, but the amount of write IOps to the log drive is only 100 as seen on the host. Now it doesn't matter whether or not you use RAID10 or RAID5, because a 2 drive RAID10 (or RAID1) on 15 RPM can handle 2 x 180 IOps, so the 200 host write IOps (2 x 100 because of the WP=2) is no problem at all. On 4RAID5 (4+1) the RG can deliver 5 x 180 = 900 IOps while only 200 is needed. So RAID5 is also not a problem if you have IOps to spare. If all IOps of a RG are already accounted for and you have some space left, do not think you can safely hand this out to new servers as your calculations say that you don't have any IOps left, so eventually you will get some performance problem somewhere.
The consideration I can think of which you are thinking of is that you might have a dedicated RAID10 RG of several disks (for example 6+6) which is only used for logs of more than 1 server. Add up all the required host write IOps and see if your RAID10 RG can handle that and if the space is sufficient. If you have a dedicated pool with RAID10 RGs especially for log drives I think you're safe, because you can easily expand the pool to provide more space or IOps. With the new Flare 32 the existing LUNs will even be redistributed accross all drives after the addition of new drives.
Calculation is always the best method, but if you have lots of servers and the amount of space for the logs and the IOps they need are in balance (you don't have room left when the IOps are all accounted for) you just might have a best practice for your company to do what you do now.