In my previous post about setting up a vCloud In a Box for your Lab, I also mentioned that I happened to install the Zeus Load Balancer to examine some of the options available for load balancing you vCloud Director Cells. Based on some of my lab testing I wanted to share a few of the findings some folks may find interesting. Generally speaking the use of a Load Balancer is easy to manage and does provide some high availability to multiple cells as well as user layer abstraction like any other stateless web service. Bear in mind there are specific configuration requirements when using multiple cells that are not covered very basically below
Configuration Requirements For Multiple Cells
- Create an FSTAB entry on each Cell to mount an NFS share to be mounted to the transfer folder similar to this:
192.168.120.2:/nfs/vCloudTransfer /opt/vmware/cloud-director/data/transfer/ nfs intr 0 0
- Ensure the Transfer Mount permissions are properly set. The vcloud user AND group must have access to the share. Some NAS devices may require performing CHMOD and CHOWN or the Cell may not start. You will know because the log will state the transfer folder is not writable. In my lab when this was mounted the owner and user were ROOT and needed to be modified first.
drwxr-x—+ 4 vcloud vcloud 33 Nov 20 18:14 transfer
- Configure the additional Cells per the instructions found in the Installation Guide. If you got the first cell to start after the transfer folder changes the second one should also start with the same modifications.
Handling Of SSL Certificates For vCloud Cells
Q. What is the best way to handle SSL Certificates for multi-cell deployments—self-signed or through trusted CAs when DNS entries are used to point to the virtual IPs of the load balancer?
A. It depends. There are a couple of ways to handle this and the deciding factor is the load balancer itself. In most cases we need to understand that the provider will also create a DNS record for the virtual IP. For this example, let’s use vcloud.company.com. If the CA is not trusted, the user will always see an “Untrusted” error on self-signed certificates. The self-signed certificates in vCD typically expire in a short timeframe so most customers are generating trusted certificates. Based on this basic understanding there is three real options I found that may work for you. Additionally today there is no way to connect a Load Balancer to the Cells using HTTP, only HTTPS which in some load balancers can cause a problem.
- If the load balancer does NOT support SSL OR of it cannot do BOTH SSL offload as well as HTTPS to the pools, which is the case with F5 and Zeus, then each cell should generate an individual CSR or self-signed certificate per cell. When creating the CSR or self signed certificate on each cell, use the same FQDN in the request that matches the DNS entry of the virtual IP (vcloud.company.com). This means the user will not get a DNS/hostname mismatch when connecting to the load balancer because the user will be directly hitting each server with the HTTP request. For F5 and other devices this is referred to as SSL passthrough mode. If you do happen to use the cell hostname in the certificate generation, and the users are connecting though a DNS name to the virtual IP, their browsers will show an SSL error for name mismatch. By the same token if you generate the certificate on each host with the shared FQDN, and attempt to connect to the individual cells you will also see a hostname mismatch. This situation cannot be avoided, and the lesser of two evils is to make sure the load balanced connection is the correctly resolved one. This seems to be the most common Load Balancer configuration option.
- If the load balancer supports SSL offload as well as HTTPS to the Cell pools, and direct creation of CSRs, then generate the CSR for vcloud.company.com on the load balancer. The cells can then create hostname-based CSRs or self-signed certificates to match the hosts cell01.company.com and cell02.company.com. This will allow a user to hit the DNS of the virtual IP or the hostname in a browser, both without error. This assumes that the load balancer supports SSL from the device to the nodes in the pool, however in the current release of vCloud SSL is still required from the Load Balancer to the Cells. To date I have not seen a Load Balancer support BOTH SSL Offload and SSL to the hosts in the backend pool, so this option will most likely not work for a while. It does appear that most load balancers will not allow SSL offload AND SSL communication to the nodes in a pool as mentioned earlier. They require the load balancer to node connection be HTTP instead of HTTPS if you want SSL offload. Unfortunately today we do not have an option to disable HTTPS and connect through HTTP only to the Cells. For this case, stick with scenario #1 and do not use the SSL offload feature of the load balancer.
- Additionally, you may want to setup a VIP for the console proxy because without one you will be directed to the Cell you were load balanced to for console connection. You can in fact also load balance this by creating a VIP and DNS name for the Console proxy and re-direct all connections back to that address by configuring the administration options in vCloud Director. IMPORTANT NOTE ABOUT THE CONSOLE CERTIFICATE: If you are using subject alternative names in your certificates it seems the Console Connection ONLY resolves the first one in the list. Therefore to avoid a certificate mismatch error on your client, be sure to make sure the FQDN version is the first in the list that matches your Console Public Address defined in the external URL’s section of vCloud Director.
- UPDATE 7/18/11: It has also been said that trying to do SSL offload on the Console IP’s does not work and you HAVE to do SSL pass-through. I have not been able to verify this myself yet but I may try in the coming weeks. This may be due to the fact the Console connection is NOT an HTTPS connection but rather a pure socket connection. This fact also made me realize that an HTTPS Health monitor for this IP will NOT work. See additional information below.
- Lastly if you have installed the cells behind the load balancer be sure to configure the Administration option for external URL’s as shown in the example screen shot below
The procedures for updating and creating SSL certificates from the vCloud cells is documented in this VMware KB, as well as the administrator guides. Based on your configuration you will need to decide the best method of user access to the load balancer and cells. Generally the Cells are stateless so you ca use the Load Balancer VIP/DNS for any access to the environment. This means either to the built in portal pages or using the supplied vCloud API’s.
Handling Heath Check Rules For The Cells
Q. What is the best way to handle a service health monitor for the vCloud Cells?
A. You can point your Load Balancer to http://<Cell-Hostname>/cloud/server_status. UPDATE 7/18/11: This check ONLY monitors the HTTPs IP and not the Console IP directly. Although if the services are stopped here the Cell should be taken out of service this does not independently monitor the Console Proxy IP for the load balanced Console connections.
UPDATE 7/18/11:
Q. What is the best way to monitor the Console Proxy IP’s independently?
A. You will have to configure the Load Balancer for a pure socket connection check on port 443 to the Console Proxy IP’s. This port will not accept HTTP calls like the HTTP port. If you configure two separate Health Monitors and have the Console Proxy and HTTP ports in separate pools, then you can ensure Independent health status for HTTP and Console connections. This means your Cell HTTP services may be up but maybe the console proxy is down for some reason and the Load Balancer will prevent console connections but still also portal connections. Below is a screen shot of a Zeus Load Balancer set of pools and the respective Health Monitors for each pool being different.
Update 8/5/11:
I got word from the engineer that wrote the code that you can monitor the Console Proxy Pool with
http://<Console_ProxyIP>/sdk/vimServiceVersions.xml and not a TCP/IP Connect String.
Preventing Access To A Cell by Users
Q. Can I use my Load Balancer to disable user access to the Cells in the load balanced pool?
A. The easy answer is of course yes. In most Load Balancers you can disable a node in the pool and the users will no longer have access to that Cell. HOWEVER it is is important to note that the Cells will still be talking to each other on the back end for task scheduling. This means a user may only access Cell 2, but their tasks and commands could be run on Cell 1. The Cells themselves load balance tasks through an internal schedular. Really, the best way to ensure a Cell is FULLY offline would be to simply stop the vCloud Service on the cell. If you configured the health check above the Load Balancer should automatically remove the Cell from the pool. This also ensures that the Cell will not accept new tasks in the Cell Cluster.
IMPORTANT NOTE: If you stop a Cell’s service abruptly and tasks are currently running those tasks will fail and will need to be restarted manually.
Q. Is there a tool available to gracefully shutdown a Cell then?
A. YES!! There is now a tool called the vCloud Cell Management Tool you can download and install. It is not a GUI based set of commands but it does help you “Drain” a cell properly for upgrades and other Operating System patches. Provided the healthcheck pages are configured this should also take care of the load balancing.