Path bouncing md3220 SAS with vmware

Question

I have 2 vmware clusters and with one off them I'm receiving warnings alerts when IO is stressed.

This is the warning message

Node ID: md3220-cgn-bct
Host IP Address:
Host ID: Out-of-Band
Event Error Code: 4011
Event occurred: Nov 18, 2013 6:39:30 PM
Event Message: Virtual disk not on preferred path due to failover
Event Priority: Warning
Component Type: RAID Controller Module
Component Location: Enclosure 0, Slot 1

I have disabled one path in vmware and configure all VD ownership in the controller with the live path and the problem becomed less frecuent but persists.

This is how the hosts are connected to the md3220

vmware1 - hba1 - - | - - ->controller 1

- hba2 - - | - | ->controller 2

vmware2 - hba1 - - | |

- hba2 - - - - |

Vmware multipathing is set up as most recently used (vmware).

All hosts are esxi 5.1

The last warning was 8 hours ago and in the same minute i receive the alert and the recovery.

Both paths bounce and happened randomly with all virtual disks.

Best regards.

alfonsograna · Answer

I've received the alert again and looking to the logs several doubts arise.

Why the path change alert is followed by a cache not enabled message? the cache is disabled after a path change?

Why I receive this alert if I have failover delay in five minutes.

alfonsograna · Answer

Here goes the log sequence I'm talking about.

Date/Time: 11/20/13 6:01:29 AM

Sequence number: 1134

Event type: 210A

Event category: Internal

Priority: Informational

Event needs attention: false

Event send alert: false

Event visibility: true

Description: Cache not enabled

Event specific codes: 0/0/0

Component type: RAID Controller Module

Component location: Enclosure 0, Slot 0

Logged by: RAID Controller Module in slot 0

Raw data:

4d 45 4c 48 03 00 00 00 6e 04 00 00 00 00 00 00

0a 21 48 00 b9 c0 8c 52 00 00 00 00 00 80 00 00

00 00 00 00 04 00 00 00 22 00 00 00 22 00 00 00

08 00 00 00 00 00 00 00 01 00 00 00 01 00 00 00

0a 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

01 00 00 00 00 00 00 04 80 00 00 00 20 00 00 00

43 61 63 68 65 20 52 65 63 6f 6e 66 69 67 75 72

65 20 53 79 6e 63 2d 30 00 00 00 00 00 00 00 00

20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00 00 00 00 20 00 00 00 00 00 00 00 00 00 00 00

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00 00 00 00 00 00 00 00 10 00 00 00 00 00 00 00

00 00 00 00 00 00 00 00 00 00 00 00

Date/Time: 11/20/13 6:01:26 AM

Sequence number: 1132

Event type: 4011

Event category: Error

Priority: Warning

Event needs attention: true

Event send alert: true

Event visibility: true

Description: Virtual disk not on preferred path due to failover

Event specific codes: 0/0/0

Component type: RAID Controller Module

Component location: Enclosure 0, Slot 1

Logged by: RAID Controller Module in slot 1

Raw data:

4d 45 4c 48 03 00 00 00 6c 04 00 00 00 00 00 00

11 40 18 01 b6 c0 8c 52 00 00 00 00 00 00 00 00

00 00 00 00 01 00 00 00 22 00 00 00 22 00 00 00

08 00 00 00 00 00 00 00 02 00 00 00 01 00 00 00

0a 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

01 00 00 00 00 00 01 00 00 00 00 00

Date/Time: 11/20/13 5:56:28 AM

Sequence number: 1131

Event type: 2044

Event category: Internal

Priority: Informational

Event needs attention: false

Event send alert: false

Event visibility: true

Description: Virtual disk I/O shipping implicit transfer

Event specific codes: 0/0/0

Component type: Virtual Disk

Component location: Virtual Disk sas10kr10

Logged by: RAID Controller Module in slot 0

Raw data:

4d 45 4c 48 03 00 00 00 6b 04 00 00 00 00 00 00

44 20 4d 00 8c bf 8c 52 00 00 00 10 13 00 00 00

00 00 00 00 04 00 00 00 0d 00 00 00 0d 00 00 00

12 00 00 00 00 73 00 61 00 73 00 31 00 30 00 6b

00 72 00 31 00 30 00 00 00 00 00 00 00 00 00 00

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

01 00 00 00 00 00 00 00 00 00 00 00

alfonsograna · Answer

I pasted some images but seems like the site filtered.

TechnicalDiffic · Answer

The cache is momentarily disabled while the VD is moving to the other controller so this is expected when a non preferred path occurs.
Go into vsphere -> configuration -> storage adapters -> click on your hba -> right click the virtual disk and select manage paths.
You should see at least two paths to the array, if you only see one then this is why your getting a non preferred path.

The two paths should be Active(io) and active. If ALUA is not enabled you may see Active(io) and standby. If you dont see two paths then check the link lights on the sas connectors on the HBA and storage to make sure they are all green. Also check the MD storage manager and make sure your servers have both of the SAS WWNs associated to them.

If both paths are there but one of them is 'standby' then you should enable ALUA to help reduce the number of 'not on preferred path' occurances. To do this, update the firmware to the newest version on the MD3200 and then ssh into the servers and run these commands then reboot the server.

esxcli storage nmp satp rule add -s VMW_SATP_ALUA -V DELL -M MD32xx -c tpgs_on;

esxcli storage nmp satp set --default-psp VMW_PSP_RR --satp VMW_SATP_ALUA;

alfonsograna · Answer

Looking from vmware side I see the paths one Active and the other Active I/O always.

Leds in the controller are ok.

There are no ports pending for association.

The multipath policy I see VMW_SATP_ALUA

I find that the problem get worse when the storage is stressed yesterday the DBAs enabled replication in SQL server and the path start bouncing each hour during all the night with 2 virtual disks one where is stored the opetating sistema of the database servers an the other with the databases.

Could somebody give a sense of the severtiy of the problem?

To take a decision to move or not the VMs to other storage?

alfonsograna · Answer

by the way thanks for your reply TechnicalDifficulties

alfonsograna · Answer

anyone? I really need help with this issue

Dev Mgr · Answer

Did you open a support case with Dell's PowerVault support? They may have some insight into things with the help of the array logs and server logs.

Michael_Tdot · Answer

Hi alfonsograna

We are experiencing pretty much the same problems here. We have noticed it will change paths when we are doing a large file copy or transfer (250GB+ size backup files, some are nearly 2TB).

Did you ever hear from Dell or Vmware on the problem?

alfonsograna · Answer

Nop.

DELL-Kenny K · Answer

Sorry for the delay as it looks like this fell through the cracks. I see you are having issues on the VD not staying on the preferred path. There is a number of things that can cause this kind of issue. What I would like to have you do is please pull a new support bundle for me and I will send you a direct email that you can forward it to so I can review it for you.

Phil Sperry · Answer

Did you find an answer for this? I'm seeing a similar thing :-)

DELL-Sam L · Answer

Hello Phil Sperry,

What is the current version of VMware that you are running? Also, what is the current version of firmware on your MD3220? What is your current multipathing policy that you are using? How many virtual disks you have, and how many virtual disks are owned by each controller?

Please let us know if you have any other questions.

Corey.McCormick · Answer

I am seeing the exact same thing, but only when I lean on the storage system with somewhat heavy I/O for a few minutes. VMware copying a few 50GB VMDK files around when cloning a VM. Normally things are fine, no errors in the logs. It is almost like it is accumulating some sort of latency/lag it doesn't like and throws the error to express displeasure.

MD3200 SAS FW 8.20.20.60

MD1200 x 3

connected using

ESX 6.0.0 on R710 Dell build 5224934

3 R710 Hosts Dual SAS port hosts ESX

1 R710 Host Windows running Veritas BE.

The issue has not occurred until today.

Other than not being current FW I didn't see any release notes that would cover an issue like this.

There is one thing I did catch is that the EMM modules in the MD1200s are 1.0.5 instead of 1.0.6.

PowerVault

Path bouncing md3220 SAS with vmware

Was this post helpful?