This post is more than 5 years old
7 Posts
0
9007
July 31st, 2018 09:00
VMware: nic failed criteria 128
I have over a dozen PowerEdge M610 and M910 servers, all with x520-k 10GB mezzanine cards in them and all with the same problem.
I've recently upgraded them to ESXi 6.0 U3, using Dell's custom A10 .iso. After upgrade, I had to manually remove the Mellanox nmlx5-core driver, otherwise EsxUpdate failed with an error 99. I don't think this is related but mentioning it here for completeness.
The main issue I'm having is since upgrading to 6.0, nics will fail during a vMotion. Sometimes a few will go through, sometimes not. Inevitably I get a message like:
"2018-07-25T13:50:19.871Z: [netCorrelator] 508215881911us: [vob.net.pg.uplink.transition.down] Uplink: vmnic4 is down. Affected portgroup: vMotion-1. 1 uplinks up. Failed criteria: 128"
This causes the current and any remaining vMotions to fail. Worse, my vMotion and VM networks share nics, so every VM remaining on the source host drop off the network.
The nics that fail are the 10 GB nics on the x520-k using driver ixgen 1.6.5.
VMware has punted, telling me to contact Dell to check drivers and firmware. The equipment is so old, support is no longer offered. I found a couple firmware updates for the x520-k. The most recent is compatible with 6.5.. so I applied the next most recent that works with 6.0 and it didn't help.
Everything I've checked shows this equipment is supported on ESXi 6.0, and I have all the recommended firmware/driver versions.
As a workaround, I discovered administratively cycling the nics fixes the problem (esxcli network nic down -n vmnicX, esxcli network nic up -n vmnicX). I've also considered separating the vMotion and VM network nics but I have no good options there - either I lose back end switch redundancy for some things, or when the problem happens I take down management or iSCSi instead of the VM networks. Going back to 5.5 is a terrible option as I'm trying to get us to vCenter 6.7, which doesn't support 5.5.


FirestormF
7 Posts
0
August 1st, 2018 10:00
Someone in another forum suggested trying the async driver. So far this seems to be working.
esxcli system module set -e true -m ixgbe
esxcli system module set -e false -m ixgben
i'll update when I've done more testing.
UPDATE: All my servers have been updated and the problem hasn't come back since switching the driver.
FirestormF
7 Posts
0
August 1st, 2018 10:00
I just updated the X520-k firmware on two of my servers to 18.5. Did not fix the problem.
CMC 6.10
The switches are M8024-k 10GbE SW, running 5.1.6.3 or 5.1.3.7.
DELL-Josh Cr
Moderator
•
9.6K Posts
•
42.3K Points
0
August 1st, 2018 10:00
Hi,
What is the firmware version on the NICs that you are using? Is the CMC on the chassis up to date? Which switches are in the chassis?