trichmond80

4 Posts

69155

February 9th, 2012 12:00

Management Networks and SAN Snapshotting

Hi all!

Long time lurker, first time poster. I must say that I'm very happy with the purchase of our EqualLogic PS4000 series SANs. I've been pouring over both EqualLogic and VMware's Best Practice Guides and I have a couple questions hanging out in my mind that I'm hoping someone can answer. I suspect it might help other folks who have searched for what I'm asking and come up short. I feel like I have quite a few of the pieces but I need some help putting them together. I hope you read this, Don! :emotion-10:

I'm using my SAN solely to host VMware vSphere VMs. I'm using Veeam Backup & Restore 6 to back up these VMs to a NAS box. It's been working great! Veeam is running off a physical server connected to the SAN with dual gigabit NICs. I'm using the 'SAN Mode' backup (their name for an off-host backup) to snapshot the VMs and dump them on my NAS box.

...now for the EqualLogic questions!

I understand I can use my EqualLogic SAN to snapshot a LUN. I also understand that I can set up replica LUNs, and I can replicate LUNs between two EqualLogic boxes. The LUNs I have carved out are solely for VMFS. Considering I'm using Veeam with the SAN Mode backup option - is there any reason why I'd want to reserve space for snapshots on my EqualLogic? Right now I have the default 20% reserve and for my situation it seems like it's wasted space? Am I missing something here? Note that I get de-duplication and compression with Veeam!
My PS4000 has a dedicated 100Mbit management port that I'm not able to use for iSCSI traffic. My plan was to VLAN off a management network for it on my switches that I'm solely using for my VM hosts and the SAN. Then I found out that my dedicated management port would need to be on a separate subnet from my other interfaces. Now I'm thinking about not even using that dedicated 100Mbit management port and just doing management through the iSCSI group IP address. To me it seems like routing that management traffic is not worthwhile. Are there negatives associated with managing the EqualLogic through the iSCSI group IP address that I'm missing?
My EqualLogic is on a physically separated network segment with no route to the Internet or my production network. I like this idea - but then I cannot use NTP or SMTP. Of course I've manually set the time correctly on my EqualLogic but I'm interested in how accurate that time needs to be. Is it only used for time & date stamps on SAN based snapshots and logs? Regarding SMTP - since I'm not able to reach my normal SMTP server, can I set up SAN HQ to collect any alerts from the SAN and e-mail me that way? Are there any alerts that wouldn't come through? What are other folks doing to address this?
Adding multiple volumes does not increase SAN performance in any way, right? Is the main reason I'd set up multiple smaller volumes versus a larger one (2TB - 512B to make VMware happy) just for the ability to set up different RAID types? I understand this question gets more complex with multiple SANs but for this question I'm talking about a single SAN

I apologize for being so verbose. I blame it on all the forum lurking I've been doing where people don't post enough information.

Responses(25)

A

Anonymous

5 Practitioner

•

274.2K Posts

1

February 9th, 2012 14:00

Hello,

Thanks for the shout out!

re:1 If you are not using EQL snapshots in addition to Veeam then yes go ahead and set the reserve to 0%.

re: 2 You can manage the array via the iSCSI subnet w/o issue. That was the normal for years until at customer request a dedicated Mgmt option was added. There's no downside.

re: 3. You can either use a firewall/router to limit what can be routed or use something like "rinetd" which is a port forwarder service for Windows 2003. It would allow connect to NTP/SMTP etc via a server that sits on both networks.

EQL Support has a PDF that explains how to install/configure rinetd. However, if you end up adding replication you'll need to be able to route to your DR site.

re: 3a. Alerts I believe the only one you would miss are EMHOME messages. Those would actually create a support case via the Equallogic Website. Alerts from SANHQ do not.

re: 4 Multiple volumes CAN increase your VMFS performance compared to a single volume. At the SCSI level, each volume negotiates a queue called Command Tag Queue (CTQ) Aka CTQ depth. That's how many outstanding requests can be sent to the volume before it tells the host to stop and finish processing commands. A typical value is 32. So the 33rd I/O request will be held until one is freed up. If you have two volumes, then each can have a CTQ of 32. Add another and you get 32 more. So smaller volumes, with fewer heavy I/O VMs is typically better. Also RAID level is set at the MEMBER level. You would have to another array to have a different RAID type. A nice benefit is that you can seamlessly move volumes between members in the same group. Or with 5.2.1 the array will balance I/O between members in the same pool based on latency. If one gets busier it will swap out blocks with the less busy member to even things out.

Hope this answers your questions.

Regards,

A

Anonymous

5 Practitioner

•

274.2K Posts

1

February 10th, 2012 07:00

Good morning,

If you are using VMware RR, have you changed the IOPS value to 3? The default is 1000, so ESX sends 1K IOs down one path before selecting the next. Which isn't optimal. Especially if you have more than two NICs for iSCSI.

This post has info on that: virtualgeek.typepad.com/.../a-multivendor-post-on-using-iscsi-with-vmware-vsphere.html

Which EMC/VMware, Dell/EQL, HP/Lefthand and NetApp agreed on. IOPs to 3, delayed ACK off and Large Recieve Offload off (LRO). LRO isn't covered in that blog posting but many folks have found that helpful.

Here's a link that talks about LRO. This link talks about guest impact but customers have reported ESX iSCSI performance issue as well with LRO enabled.

docwiki.cisco.com/.../Disable_LRO

Solution Title

HOWTO: Disable Large Receive Offload (LRO) in ESX v4/v5

Solution Details

Within VMware, the following command will query the current LRO value.

# esxcfg-advcfg -g /Net/TcpipDefLROEnabled

To set the LRO value to zero (disabled):

# esxcfg-advcfg -s 0 /Net/TcpipDefLROEnabled

a server reboot is required.

You didn't mention what version of ESX you are using so I didn't include how to change the IOPs value since it's different from ESX v4->5. You can find it via Google also.

Re: Exchange. For Exchange or SQL, I've always suggested using "Storage Direct" with ESX. So that the MS iSCSI initiator is directly connecting to the Data and log volumes. This takes all the VMFS issues out of the way and allows you to leverage all the integration features of the EQL HIT kit. I.e. you could do true off host backup where an array snapshot can be mounted to the backup server. Software like Backup Exec supports doing that automatically with their ADBO module.

It doesn't work for VMFS unfortunately, since the VMFS filesystem isn't supported by Windows.

RE: Queue depth. In some ways yes, but consider what happens as the load goes up, you could end up in trouble w/o a way to fix it now if you run out of space.

Regards,

trichmond80

4 Posts

0

February 10th, 2012 07:00

Thank you so much for your input, Don. I really appreciate it. Your answer regarding multiple volumes makes complete sense to me. Since I'm using multipathing (via VMware round robin) I'm assuming I need to balance my number of iSCSI connections to the SAN with the performance benefits offered by multiple volumes. I suspect this won't be as much of an issue with current firmware versus older (I know the iSCSI connection number limit was doubled in recent firmware).

I already have SanHQ 2.20 installed so I can use that to look at my current queue depths. I also did some poking around the forums and I see there is some LUN design talk as it relates to VMware. I understand there is no one-size-fits-all solution so I'll tackle all that next.

If I see in SANHQ 2.20 that my queue level never hits 32 do you imagine I would see less performance benefits to chopping my LUNs up into smaller pieces? I imagine more iSCSI connections at the same time is better, I just want to make sure that the pain of redesigning and copying all my VMs into additional smaller LUNs is going to be worth it.

Should I limit my redesign scope to focus on servers that use the most IOPS? It'd basically be my single Exchange 2010 box, two separate SQL servers, and my file server.

trichmond80

4 Posts

0

February 11th, 2012 16:00

Thanks for all the great info, excellent post. I'm using vSphere 5 Standard. I did some testing with the iops value when I first got my SANs and setting it to 3 didn't improve my results with iometer (using settings to take advantage of the 1MB block size) so I didn't keep the change. I understand now why I didn't see improvement with iometer - I knew the results wouldn't be 'real world' but I didn't know how else to test. I'm glad the consensus is to set it to 3.

Now I've set my iops value to 3, disabled LRO, and turned delayed ACK off on all my hosts. The LRO thing was new to me so thanks for mentioning it! I know there is a lot of information out there about lun design and sql/exchange best practices so I'll read up on that stuff next.

Thanks again for taking the time to reply.

A

Anonymous

5 Practitioner

•

274.2K Posts

0

February 12th, 2012 04:00

You are very, very welcome. I'm glad I could help.

usdii

4 Posts

0

February 12th, 2012 05:00

Those would actually create a support case via the Equallogic Website. Alerts from SANHQ do not.

A

Anonymous

5 Practitioner

•

274.2K Posts

1

February 13th, 2012 07:00

FYI: Here's a link to the port redirector, rinetd, that I mentioned.

codewut.de/Port-Redirection-with-Windows

sketchy00

203 Posts

0

February 14th, 2012 19:00

Hey Don, can you confirm that this is still advisable in vSphere 5.0 when using Round Robin? (I dug around some Dell / EQL docs, but it didn't mention it. For some reason I thought this was no longer an issue. You covered the LRO command line change, but what was the string needed for setting the IOPs to 3 (under vsphere 5)? What have you found as the best way to monitor the balance of the RR?

A

Anonymous

5 Practitioner

•

274.2K Posts

1

February 15th, 2012 03:00

Yes, ESXi v5 uses the same 1000 IOs setting by default, so it needs to be changed. Especially when you have more than two interfaces for iSCSI and multiple member pools.

Setting default policy for EQL devices to Round Robin. So new volumes discovered will be set to Round Robin

#esxcli storage nmp satp set --default-psp=VMW_PSP_RR --satp=VMW_SATP_EQL

**These new volumes will still need to have the IOPs value changed.

To gather a list of devices use:

#esxcli storage nmp device list

You'll need the naa. that corresponds to the EQL volumes in that list. That the "device number" that is used in the next command.

Existing volumes can be changed to Round Robin

#esxcli storage nmp device set -d naa.6090a098703e30ced7dcc413d201303e --psp=VMW_PSP_RR

You can set how many IOs are sent down one path before switching to the next. This is akin to rr_min_io under Linux.

NOTE: This will only work if the policy has been changed to Round Robin ahead of time.

The "naa.XXXXXXXXXXXXX" is the MPIO device name.

You can get a list of devices with:

#esxcli storage nmp device list

naa.6090a098703e5059e3e2e483c401f002

Device Display Name: EQLOGIC iSCSI Disk (naa.6090a098703e5059e3e2e483c401f002)

Storage Array Type: VMW_SATP_EQL

Storage Array Type Device Config: SATP VMW_SATP_EQL does not support device configuration.

Path Selection Policy: VMW_PSP_RR

Path Selection Policy Device Config: {policy=iops,iops=3,bytes=10485760,useANO=0;lastPathIndex=3: NumIOsPending=0,numBytesPending=0}

Path Selection Policy Device Custom Config:

Working Paths: vmhba36:C0:T1:L0, vmhba36:C1:T1:L0, vmhba36:C2:T1:L0, vmhba36:C3:T1:L0

This also lets you confirm the path policy "VMW_PSP_RR" Which is VMware, Path Selection Policy, Round Robin" And not the IOPs value has already been set to '3'.

#esxcli storage nmp psp roundrobin deviceconfig set -d naa.6090a098703e30ced7dcc413d201303e -I 3 -t iops

#esxcli storage nmp psp roundrobin deviceconfig get -d naa.6090a098703e30ced7dcc413d201303e

Byte Limit: 10485760

Device: naa.6090a098703e30ced7dcc413d201303e

IOOperation Limit: 3

Limit Type: Iops

Use Active Unoptimized Paths: false

For monitoring I use a combination of the performance monitoring in ESXi and SANHQ.

sketchy00

203 Posts

0

February 15th, 2012 06:00

Wow... I overlooked this somehow. Thanks Don. Having previously set each datastore to RR, I did execute the following on one of my datastores:

#esxcli storage nmp psp roundrobin deviceconfig set -d naa.6090a09860213c7843eb34e9380180de -I 3 -t iops

Looking at the details again via esxcli storage nmp device list, I see that the "iops=3" validated the change, but I also see the the string "policy=rr" was changed to "policy=iops" Is this expected?

Also, is it advisable that this only be performed while in maintenance mode, or can it be changed while in production?

sketchy00

203 Posts

0

February 15th, 2012 07:00

Perfect. The test I had done was on a system in maintenance mode. Just wanted to verify. Thanks.

A

Anonymous

5 Practitioner

•

274.2K Posts

1

February 15th, 2012 07:00

Yes policy=iops is correct. From the solution I included:

Path Selection Policy Device Config: {policy=iops,iops=3,bytes=10485760,useANO=0;lastPathIndex=3:

It 'can' be done live but I prefer not to personally. Just in case the value gets set to something wrong and performance is impacted.

A

Anonymous

5 Practitioner

•

274.2K Posts

0

February 15th, 2012 07:00

You're welcome!

trichmond80

4 Posts

0

February 15th, 2012 15:00

Thanks for the link to rinetd, Don. I plan on forwarding NTP and SMTP traffic from my isolated iscsi network onto my production network via my physical backup server. Seems like a better idea than routing that traffic...why make things more complicated than they need to be, right?

A

Anonymous

5 Practitioner

•

274.2K Posts

1

February 15th, 2012 16:00

that works, or since you have a 4000 there's a dedicated Mgmt port @ 100Mb that you can put on your LAN so you can manage it and get alerts to your Exchange server.

1
2

View All

No Events found!