Welcome to the EMC Support Community Ask the Expert conversation. This is an opportunity to learn how to troubleshoot VNXe front-end configurations with VMware.
Henri is an IT professional with over 10 years of experience in the Storage domain.
For the past six years Henri has been focusing on planning and implementation of virtual environments using VMware and EMC products. Henri has extensive experience in EMC CLARiiON and VNXe product families, including successful disaster recovery implementation using EMC MirrorView and VMware SRM. Henri is a VMware Certified Professional and also EMC Proven Professional. Henri has also published several posts about his hands-on experience with VNXe on his blog.
Matt Brender is a Systems Engineer, with EMC, in the Unified Storage Division developing on the VNXe and VNX systems. In the last four years Matt has moved from CLARiiON & Celerra performance analysis to being part of the team that brought VNXe to market. Matt is an EMC Proven Professional and holds Specialist certifications in NAS and CLARiiON. He is active within the VMware User Group community, follows VNXe conversations on Twitter and responds from his personal account @mjbrender.
This event concluded on January 27th, 2012. However you can continue the conversation as a regular discussion.
Matt and Henri put together this wrap up summary document, of the Ask the Expert Event, on the key take away points they found.
The event discussions themselves, in detail and chronologically, are below.
Well, my question isn't really about troubleshooting, but if you have the time I'd appreciate the advice. If this is not the approprite venue for this question please say so.
I'm new to VMware/VNXe. Currently replacing some aging hardware (some 10+ years old). Recently purchased a VNXe3100 and built a 3 node VM cluster (ESXi 5). I'm only likely to have about 8-10 VMs all lightly loaded. I haven't put anything critical on it yet (just a print server wehre I can revert back to physical hardware easily if needed).
I have been looking around and couldn't find much in the way of best practices or rules of thumb. ie is one large pool and one large VMFS volume a "bad" think to do?
My setup is:
Two RAID5 sets on the VNXe, one is allocated to an iSCSI server and the other to a shared file server for CIFS shares.
One large data pool for each storage type.
I do not have snapshots enabled on the storage pool used for the iSCSI server. The recommended snapshot space of 235% seemed like a large amount of storage to give up for what I saw as little gain. Correct me if I'm wrong here, but when would one really be likely to recover a snapshot no a VMFS volume? I would think I'd be much better off with snapshots on the VMware side so I could roll back an individual VM rather than the entire VMFS volume.
Just wanted to make sure I'm not doing anything stupid before I put production VMs on here and find out the only recourse is to delete the entire datastore and start over.
Thanks for your question.
How many and what type of disks you have on it?
We are using 28 disk R5 pools on 3300 and soon we will have 3100 with 20 disk R5 pools. On 3300 we are using 2TB VMFS volumes. Issue with big volumes is that you tend to put too many VMs on those. 8-10 VMs should be fine, depending of the load of course. I would personally go with the bigger pools and not with the samller RGs.
With the snapshots I agree with you
I have a 3100 with dual SP's, the extra 4x1GB ethernet modules, and 11 600GB SAS disks (two 5 disk R5 with hot spare).
I have one set alocated to iSCSI and the other to shared folders (each on a different SP). It seemed like a good idea to keep the iSCSI and shared folders on seperate spindle sets, you agree? I have the two base ethernet ports on each SP serving the iSCSI server (setup more or less per the VNXe HA whitepaper using two HP Procurve 2910 switches). The shared folder server is using ports on the ethernet module.
So at the moment given the size of my environment, you'd agree that just making one large (1.5-2TB thick provisioned) VMFS volume with no snapshotting is a reasonable decision? In the future if I need more space I'll add a disk enclosure and another R5 set and create a second VMFS volume rather than growing the first.
For multipathing, round robin or most recently used?
I'm planning backups with Backup Exec using the NDMP plugin for the CIFS server on VNXe, the virutual infrastructure agent to backup the vm disks for DR, as well as agents in the VMs for file level backup) going to LTO5 tape.
Any other things I should watch out for as an ingnorant newbie?
Your question is perfect for this forum, and I love talking about these kind of details. I'll focus on the drives for this response.
You mention two R5 RAID Groups on the VNXe, so we know you have ten drives configured for user data. The volume manager on the VNXe will slice these drives up using logic to balance the disk consumption across spindles. Now comes the decision point: a single pool or split up the pools? I am all about admitting the checks and balances of a decision like this one.
What multiple pools offers is IO segregation. You know what storage is tapping into what spindles with certainty. It's handy for troubleshooting. For the SAN performance buff, you know it means you get 5 drives worth of IOPS per server, which is calculable. You know what systems will be impacted by specific drive removal or fault. There is comfort in this certainty.
What you lose with multiple pools is simplicity. You spend more time managing the system. There is also always that moment when you have 30GB in one pool and 25 in another and you have that realization that you wish you could carve up 50GB.
What single pools offer is simplicity of management. You know the slices of drive capacity that make up the storage will be spread across all the drives, so that's no problem. Life is easier and your storage is consumable more readily. There is also a potential performance benefit of a very active stream tapping into all 10 drives rather than just 5.
What you don't know is who is where - there are slices of each server's storage on all the drives. This extends the impact of a double faulted RG for example. It also increases the chance of spindle contention, which is when servers are requesting more IOPS than a drive can service within normal latency windows.
Now that the options are laid down, you asked for an opinion, so I'll give one. I am all for simplicity of management, so a single pool per tier of storage is my default option. The workloads of an iSCSI server and a file server can be complimentary, so I would spread the load across all drives possible unless I had good reason to segregate a specific workload. Remember that write cache is shared and mirrored to each SP, so the idea of isolating server storage is not going to hold. Placing one server to SP is smart thought - it lets you maximize your processing power.
And let's not forget - at some point you'll want to reclaim space. Planning for a double faulted RG on 11 drives is like planning for lightning to strike every single day. I always find it easier when all my eggs are in the same basket. Plan well, configure a HS like you did and have a backup for the worst case.
So there you have it - there's a breakdown of my logic related to drive layout. We'll get to more details tomorrow.
I would also go with the 10 disk pool. Like Matt mentioned that sometimes you need IO segregation and the one pool isn't an option. Putting all database server disks (db, log, tmp, backup) on the same pool might not be the best option.
Matt also mentioned the impact of a double faulted RG. What does RAID Group have to do with Pool? Well Pool has two or more RAID groups in it and the data is striped across the RGs. VNXe can use up to four RGs in a Pool. If you create pool with 10 disks it will have two 5 disk RGs in it and the data will be striped across the two RGs. If you then add ten more disks to that pool the data that were there is still only striped across the two first RGs. New datastores that are created to the extended pool can use all 20 spindles. But if the whole capacity of the 10 disk pool is in use and 10 more disks are added to the pool then the new datastore will only be striped across the two new RGs in the pool.
Definitely multipathing and round robin. I have made some tests with and without RR and got about 100MBps more throughput from VNXe 3300 when RR was enabled. I'm still testing the performance on 3100 and it seems that ESX 4.x with two separate ports and RR is performing better than link aggregation.
OK, so 10 disk pool it is... any advice on the easiest way to change things around? As I said preivoulsy I have the two pools each with one RG. I don't really have any data on the pool allocated to the shared folders. Can I just delete the shared folder server, recycle the second pool, and add those disks to the first pool that currently has the VMFS volume, then recreate the shared folder server again using the one and only pool?
I'd rather not have to delete the pool with my vmfs volume on it if it can be avoided.
Yes you can recycle the RG with CIFS share on and then extend the first RG with the recycled five disks. You don't need to delete the server, just the datastore. I would also suggest creating new VMFS datastore on the extended pool and then moving data from current datastore to the new datastore to get the benefit of all 10 spindles. VMFS volume that you have now is still going to be on one RG after the extension.
Henri hit an expert level point that's worth repeating:
The VNXe's volume manager will stripe across available drives at creation. Since this process will add 5 more drives we want to take advantage of, you'll want to made a new datastore and migrate the data over. There is no dynamic restriping which is important to keep in mind as you plan out capacity consumption.
Keep the questions coming.