Start a Conversation

Unsolved

This post is more than 5 years old

137491

June 27th, 2014 04:00

iSCSIprt errors 9, 39 and 129 on Windows Server 2012 R2 Hyper-V hosts

Hi,

We’re migrating from Vmware to Microsoft Hyper-V 2012 R2. We’re going to set up a 4 node cluster and have set up a cluster shared volume for storage. At the moment we have 3 nodes up and running. Originally we planned to team our SAN-nic’s but after reading several white papers and browsing several web sites, we changed our plan and landed on using two SAN-nic’s, each with its own IP.

We’ve created two virtual SAN switches in Hyper-V for fault tolerance.

The VMs are a mix of Windows Server 2003 R2, Windows 2008, Windows 2008 R2, Windows Server 2012 R2 and Windows 7, with roles such as IIS, File Cluster Services, App servers and DB servers.

We’re now experiencing that the VMs freeze in the middle of an operation and our users are starting to complain about these freezes and system halts. Users are accessing resources through our Citrix farm.

In the event viewer on our Hyper-V hosts we get a lot of iscsiprt errors and we suspect these errors have something to do with our problems.

We’ve not been able to find any suitable solution googling the errors we receive. We’re not sure who to address this issue to. Is it Microsoft or is it Dell?

Hardware:

Dell M100e Chassis

Pass-through module

Dell PowerEdge M710HD

192GB memory

2x146GB

2xIntel Xeon E5645

6 Broadcom BCM5709S nics

 

Dell PowerConnect 6248

Jumbo frames is not enabled on the servers with access to our SAN. SAN traffic is on a separate LAN not accessible through our firewall.

 

3xEquallogic PS5000X

1xEquallogic PS6000

1xEquallogic PS6100

Firmware is 6.0.4

17 Posts

June 27th, 2014 07:00

I had similar iscsiprt errors in a Server 2008 R2 Hyper-V environment, on hosts and guests.  Was never able to find a root cause, but, Dell technicians strongly believed it was some issue with the iscsi initiator built into Windows.

Aside from that, I'd really consider dumping the Broadcom NICs.  The list of problems I've experienced with them is endless, particularly in the Hyper-V world.  I've worked with many Dell technicians who believe Broadcom NICs are not production-worthy devices.  Switched over to Intel NICs after years of troubleshooting various issues and nearly all were resolved immediately.  Might be worth considering this to rule out that Broadcom isn't the root of your problem.  Hope this helps.

5 Practitioner

 • 

274.2K Posts

June 27th, 2014 09:00

One side note, while you are still running ESX,  please upgrade to at least 6.0.10.   There is an issue in versions lower than 6.0.6-H2 that can result in VMFS Datastore corruption.   Specifically in the VMFS storage heartbeat metadata area.   Which requires manual repair by VMware, and can result in data loss.

While the issue shows up in ESXi,  it is possible for other cluster filesystems to see it as well.  

Also making sure ESXi is configured to our best practices is also suggested

en.community.dell.com/.../download.aspx

Re: Errors.   When that happens are there any events in the EQL GUI?   I suspect INFO messages of either:  logout request by initiator, or load balancing request

How many members in any one pool?   If you have more than three, those errors can occur when space is moved off one member to another.  

Re: Broadcoms.  I will say that upgrading the broadcom firmware and drivers is important.  Also there are some specific Windows settings as well.

en.community.dell.com/.../download.aspx

Also make sure the 6248 switches have current firmware.

Regards,

20 Posts

July 1st, 2014 06:00

We're not licensed to upgrade our  ESX to a higher level than 4.1. As soon as we've resolved our issues mentioned above, we'll reinstall our remaining ESX and include it in our Hyper-V cluster.

RE: Info Messages. Yes, we have a lot of logon request by initiator messages. What do they mean?

One pool has got 2 members and another pool has got 3 members.

RE: Broadcom. We're on Broadcom firmware 6.4.5

Regards,

Tommy

5 Practitioner

 • 

274.2K Posts

July 1st, 2014 09:00

The logout messages mean a connection is being balanced to another physical port on an array to maintain even performance. Another reason to upgrade in older firmware, the HIT or MEM module would conflict with the Network Load Balancer on the group.  Causing more frequest balance messages.

The 6.x broadcom firmware tended to be more problematic, upgrading to the current 7.x would be recommended.

No Events found!

Top