In part one of this series, we examine the minor issue of the clone process being managed and loaded onto a single host in a cluster under vCloud Director. In part two, we expanded on that to see how the different vApp deployment scenarios actually clone to other hosts. Now in part three we want to understand what if anything we can do from a design consideration to deal with these known challenges.
The script located in part one is certainly useful in order to mitigate the balance of the powered off Virtual Machines in a cluster. This can provide a lot of help in the source of the clone process for sure. That being said, what can we do do design a large scale provider hosted catalog infrastructure where there the possibility exists to have multiple vCenters, cluster, and Virtual Datacenters. Let’s take a look at the original layout I used for testing below. The only change is let’s assume there is one Catalog vApp instead of two.
- vCloud Director Cells will proxy traffic on their HTTP port for exports, (Provided you have not configured a management Interface that also has direct access to the ESX hosts in which case the copy may hit there instead)
- ESX hosts will utilize their management interface for network based copy to a Cell or to another ESX host in a different cluster in the same vCenter where storage is not equally presented.
Knowing this here are some initial recommendations you might want to consider, none of which are gospel just some things to think about.
- Consider the network connectivity of the vCD Cells carefully. Possibly put HTTP ports on a separate subnet and use a management interface that is localized to the ESX management network layer 2 to isolate incoming copy traffic from HTTP portal traffic. Essentially keep all “VMware Management” on a separate subnet for traffic flow. A good reference for understanding some of the networking and static routing to make some of the suggestions work is Hany Michael’s post about publishing vCD on the internet. Although it is specific to that scenario, the networking segmentation is good to understand now that we know how the traffic flows. You may even look to have more interfaces on your vCD cell to meet the varying requirements.
- On vCD Cells use separate VMNIC interfaces for HTTP/Console/NFS/Management and enable Jumbo Frames on the Management/NFS ports. The goal is to prevent HTTP portal traffic from getting caught up in the copy processes.
- Enable Jumbo frames on the NFS Storage, switches in between, and on the VMKernel ports on ESX hosts. This may provide some performance improvement on the Management Network copy processes that we have seen.
- Alternatively create separate VMKernel interfaces for vCloud Director communication separate from the Management Interface, (However I have not tested this, but in theory it may also do the same thing.)
- Within at least the initial primary vCenter instance consider a cross-cluster datastore to host the catalog items on. This will at least provide block based copy within that vCenter, but will not help when additional vCenters are added to vCD. You will be left with network copy when deploying a vApp from a catalog across vCenters. As pointed out by a comment on Part One, having a VAAI enabled array in this case would help offload the block based copy process. However, once you need to deploy to any cluster not seeing that Datastore, OR across to another vCenter you are back to network copy and/or vCD Export and OVF import.
One last thought is that you could dedicate a cluster and Provider vDC to just hosting and deploying templates. This is slightly different than just creating an Organization for catalog and templates, I am suggesting an actual provider vDC be used so the hosts are dedicated to these tasks. This would not need to be large as there would not be a lot of load for running Virtual Machines. Maybe this is also a cluster used for Provider only based workload that is not customer centric workloads. This could also then be used by Organizations to host their catalogs, and maybe there is a different cost associated with catalog items versus running Virtual Machines. Of course all customers would then need to be sure to place catalog Virtual Machines in this vDC. This would ensure that all I/O and network load is consumed by dedicated hosts for a good portion of the process.
What I can confidently say is there is no one single best solution to addressing this challenge. What I can offer up and suggest is that most customers I have seen this far are installing vCloud Director in smaller Proof of Concept or lab situations and have not really thought about larger scale deployments where a second vCenter will be in play. I do personally know of t least two customers that have two vCenters planned or currently deployed in their vCloud Director as of today. The purpose of this three part series was to give you some deep dive background on what is going on under the covers. How you chose to address them at the end of the day, but I hope it helps understanding how the parts are actually communicating behind the scenes so you can make better design decisions. Comments and other ideas are certainly welcome if anyone has any other ideas.