It seems I have had a number of the same conversations in recent weeks about a very particular topic, and some could argue a design flaw, with the most basic of VMware vSphere deployments. This has to do with using an overly simplistic deployment whereby the ESX host is only configured with (1) VMKernel port, primarily VMK0. While this “works fine” there is some simple inherent issues that can occur downstream with such a basic configuration. Let’s take a short look at the problem, but I am surprised it’s 2022 and some basic fundamentals of vSphere are still mis-understood
Let’s take this simple design shown below and break down what is happening. We need to assume the following is configured as well
- LACP is not configured
- Management, and vMotion services are configured on VMK0 (default out of the box)
- VMK0 has is the only routable interface
- Virtual Switch is configured for active/active of the pNICs
The question we now have to ask is how is traffic flowing for everything in this host? It’s pretty simple really. On boot VMK0 will bind to ONE of the two pNICs based on the assumptions above. This means ALL traffic will flow over a single pNIC including most of your network based storage connections. As you add services and storage the only path to mount these is via VMK0 and thus pNIC1. Are we starting to see the problem? The second NIC is totally “unused”. Now this is a big deal on 1G connection and maybe sometimes even on a 10G or 40G connection. The real issue IMO is “wasting” the other pNIC for any host level services, let alone the potential security implications and risks associated with this.
How most vSphere architects have solved this is simple. Add more VMKernel ports for specific traffic and manipulate the Distributed Port Group teaming settings to re-path the communications. In some cases do this over L2 connections so ensure minimal routing. Here is a simple example of this concept
Lastly if you wanted to expand on this even more here is a table that breaks down using multiple VMKernel ports as shown above I have used many times in past designs. Here you can see we force traffic to active and standby uplinks, (assuming still there is no LACP), thus utilizing both pNIC’s to their fullest extent.
VMKernel | Name | VLAN Type | TCP Stack | Services | UpLink1 | UpLink2 |
vmk0 | Management | L3 | Default | Mgmt Only | Active | Standby |
vmk1 | vMotion-1 | L2 | Default | VMotion Only | Standby | Active |
vmk2 | vMotion-1 | L2 | Default | VMotion Only | Active | Standby |
vmk3 | ESX-NFS | L2 | Default | none | Standby | Active |
vmk5 | iSCSI-1 | L2 | Default | none | Active | UNUSED |
vmk6 | iSCSI-2 | L2 | Default | none | UNUSED | Active |
There are other uses for more VMKernel ports for other traffic such as NSX-T, backup, cold migration, replication etc. However, this basic foundational mistake seems to be all too common. Once you are setup in this way, you have a lot more options to your overall design. In another post I may show how this can also affect the Storage Migration (Storage vMotion) process as well.