1 Copper

Dell R740XD VMWare 6.7U3 Unstable Network Connection

Hi guys, relative noob here. Apologies for the long winded post but I'm trying to give as much information as I can. I was running Dell VMWare 6.5U3 on a Dell R720 - rock solid. I replaced the R720 with a R740XD in the rack using the same network cables, same ports on the Netgear switch and power cables and setup of VMWare configured to use the same NIC and IP configuration. The R740XD was loaded with Dell VMWare 6.7U3.

The vSwitch on the R740 is configured as default standard, with two port groups and four uplinks. Port groups are VM Networks and Management Network. This is the same configuration that works without issues on the replaced R720 and the remaining R720.

I tried to copy a 3.6TB Centos VM from another Dell R720 running Dell VMWare 6.5U3, using VMWare vCenter Converter Standalone,to the new Dell R740XD, which I had done many times before between the two Dell R720s, without any problems. I was quite surprised to see this operation fail after about 30 minutes - reason for failure given by Converter Standalone was Network Error.

I then pinged the Dell R740XD from the Windows 10 PC running Converter standalone and discovered that after about 20 successful pings I would get 4 to 6 packets timing out, then another 20 to 24 successful pings, then 4 to 6 timeouts, and so it would repeat this cycle of successful pings and timeouts, I pinged the IRDA interface from the Windows 10 PC and this was rock solid - no timeouts. I ssh using putty into the R740 Esxi shell and the session terminates intermittently.

My next logical step was to eliminate the differences between the R720 that worked without any issues and the Dell R740XD. The obvious difference was the 4 port network interface controller on the R720 and the R740. The R740 is configured with a Broadcom 2x 10GB/s 2x 1GB/s network interface. I disabled the two 10GB/s ports (VMNic0 and VMNic1) and ran the ping test again. Same result - cycle of successful pings and timeouts. I re-enable the 10GB/s  ports.

I then ssh into Esxi on the R740 and ran a reverse ping i.e. from the R740 to the Windows 10 host and the connection is rock solid - no timeouts. So I'm wondering why does the ping test run in a command prompt on the Windows 10 PC have timeouts but the Esxi shell has no timeouts? I have no answer so I decide to try loading Dell VMware 605U3 on the R740 as this ran on the R720 with no issues.

There is an improvement with Dell VMWare 6.5U3. I still get a timeout in the ping cycle from the Windows 10 PC to the Dell R740 but there is now only 1 timeout in the cycle of every 20 to 24 successful  pings, but it is not rock solid. The reverse ping from the R740 Esxi shell is still rock solid - no timeouts. The ping from the Widows 10 PC to the IRDA is also still rock solid - no timeouts.

I retry the 3.6TB Centos VM from the Dell R720 running Dell VMWare 6.5U3, using VMWare vCenter Converter Standalone,to the new Dell R740XD. t would appear that the Converter Standalone can handle 1 timeout every 20 to 24 packets and so far the copy has been running for about an hour without failing due to network error. Estimated time remaining is 19 hours.

While the copy appears to be working (its still early days) the network connection is not 100%. Where do I go from here? How do I get the network connection 100%? Like I said, I'm a relative noob when it comes to VMWare and a total noob on the R740XD. Any assistance will be greatly appreciated. I have ordered new CAT 6 2 metre flyleads, which will delivered on Monday, as a precaution to eliminate another possible point of failure.