Site icon Chris Colotti's Blog

Gotcha: vCloud Director Clone Wars Part 1 (Overview)

A couple weeks ago there was a community post about a possible issue around vCloud Director and the I/O load of the clones hitting all one ESX host.  Duncan Epping and I did a little investigation on this and we discovered that this is not a vCloud Director issue so much as it is a function of vSphere’s cloning process.  First we need to understand a couple of things.

  1. vApp templates in vCloud Director are not stored in vSphere as “Template”, they are simply a powered off Virtual Machine.
  2. Powered off Virtual Machines are registered to any particular host that was last running them.
  3. DRS will only move powered on Virtual Machines based on the load based algorithm except during a Maintenance Mode process you have the option to move powered off virtual machines.
  4. You can manually migrate powered off virtual machines
  5. When a clone happens on a given host, if the target location is a datastore seen by that host it will be a block level copy, if the target is a host without access to the source files, this will usually result in a network copy over the management interface. (This will be investigated in Part 2 further)
  6. The host that owns the registered virtual machine will be the one that performs the I/O workload of the clone

The last bullet is the key issue.  Essentially a possibility exists that multiple vApp templates could live on only a few hosts.  Now, that being said we should also assume that the vApp template was at one point powered up, and DRS properly powered them up on different hosts.  Once powered off they will remain on the last known host until they are manually moved, or a Maintenance Mode triggers them to be moved.

Now there also is the possibility that over multiple maintenance modes, the powered off Virtual Machines could end up on only a couple hosts.  When consumers deploy a vApp this process would then happen on those hosts dragging them down.  The other issue is that if you are a provider and consumers deploy the SAME vApp template, the host where that vApp is registered will handle most of the load for all those deployments.  So how do we work around this for now?

Alan Renouf was kind enough to provide a PowerCLI script to balance out the powered off Virtual Machines across the cluster.  This can solve the issue where too many powered off Virtual Machines end up on the same host or group of hosts.  The script can be found below and will balance out the powered off Virtual Machines in a given cluster.

Script 1 – lists the hosts in the clusters and the number of powered off VMs on them so you can see if this is relevant

Get-Cluster | Sort Name | Foreach {
      Write-Host "Cluster: $_"
      $_ | Get-VMHost | Sort Name | Select Name, @{N="NumVMPoweredOff";E={@($_ | Get-VM | Where {$_.PowerState -eq "PoweredOff"}).Count}}
}

Script 2 – moves the VMs equally among the hosts – note it does assume networks and datastores are the same on all hosts in a cluster.

Get-Cluster | Foreach {
      Write "Balancing Cluster: $($_.Name)"
      $HostsinCluster = @($_ | Get-VMHost | Sort Name)
      $numberofhosts = $HostsinCluster.Count
      $hostnumber = 0

      $_ | Get-VMHost | Get-VM | Where { $_.PowerState -eq "PoweredOff" } | Foreach {
            $MoveHost = $HostsinCluster[$hostnumber]
            if ($_.VMHost -eq $MoveHost) {
                  Write-Host "Leaving $($_) on $MoveHost"
            } Else {
                  Write-Host "Moving $($_) to $MoveHost"
                  Move-VM -VM $_ -Destination $MoveHost -Confirm:$false
            }
            If ($hostnumber -eq ($numberofhosts -1)) {
                  $hostnumber = 0
            } Else {
                  $hostnumber++
            }
      }
}

This takes care of the original issue of a single or small group of hosts getting all the clone I/O workload, however in Part 2 of this post I will examine a slightly different more interesting twist to this.  I am in the process of reconfiguring my lab so we can see the deeper affects of the clone wars in vCloud Director.  The question there really is what happens regardless of balance if 100 people all deploy the SAME vApp template?  Is there a way to mitigate that I/O in the design of vSphere under the covers to not adversely affect the other Virtual Machines running on that same host.

Jump to Part Two

Exit mobile version