Windows 10 Packet Loss – vSphere 7 and NSX-T 3
I was recently working with a customer that was experiencing some interesting network packet loss in their environment – insofar as any packet loss can be interesting. The reported issue was that any Windows 10 virtual machine was impacted with up to 60% packet loss when testing with ICMP packets. This impact was not observed with any other operating system.
Having torn apart the NSX-T installation looking for a reason why the VMs might be impacted at that level and coming up blank, an additional machine was installed from scratch rather than from the Windows 10 template. To satisfy curiosity the same tests were conducted without VMware tools being installed and on an E1000 driver… There was no packet loss.
Zeroing in on the culprit as soon as VMware tools was introduced with a complete installation, the packet loss returned. From packet captures it was possible to see that ICMP traffic was not being generated at the guest OS level. Evidence from the sequence numbers indicated that the NSX-T layer was receiving, as an example, packet sequence number 100, 101, 102, .. , .. , .. , .. , 107, 108 etc…
Looking at the release notes for VMware tools releases you can see that NSX Network Introspection Drivers are included in VMware tools and from tools version 11.0.0 support ICMP. Under the resolved issues section we have;
Outbound IPv6 traffic for ICMP and UDP protocols could experience packet drops.
NSX Network Introspection Driver is used to retrieve the network context from the Guest VMs.
From VMTools 11.0.0, support has been added for ICMP and UDP protocols. The driver now intercepts the UDP and ICMP traffic, collects the required context, and re-injects the packets back. There was a problem in the packet re-injection code for the ICMP and UDP IPv6 packets for outbound traffic. As a result, outbound IPv6 traffic for ICMP and UDP protocols could experience packet drops.
Note: Network Introspection Driver is not installed by default and can be installed with VMware Tools ‘Complete’ installation for NSX IDFW and NSX Intelligent features only.
This issue is fixed in this release.
Whilst this is referencing IPv6, the resolved issue does seem familiar to what is being observed in the customers estate with IPv4 ICMP traffic.
Confirmation is simple, the NSX Network Introspection service can be stopped and the tests conducted again;
sc stop vnetWFP
After running that command there was no more packet loss on the windows 10 guest OS and the correct packet sequence numbers where observed.
Obviously this is not ideal if you are using the NSX Guest Introspection Driver to offload security functions to security appliances at the host level. I guess if this is the case, roll back to previous versions of VMware tools.
Hopefully this is useful to someone