I’ve already fielded a number of questions on this topic and I want to start by saying this is not an “Official” recommendation hence why I am posting on my personal blog since this is more about considerations than actual calculations. The trick with anything like this is running the risk of people taking what you say as gospel, and this is just intended to help get you thinking. First we need to understand the way the service is configured and we need to all remember how vSphere works. I will not go into all those details you will need to read about them elsewhere on this blog.
Performance vs. Being “Up and Running”
What I mean by this is many people sometimes mistake the need to be up and running in a disaster event for also running at production performance. I think this is an important distinction for people. Most folks that understand DR realize that when you have an event, you are just trying to get back online, not always back online at the same scale you were before. There may be exceptions to this rule of course as there always is. However, from my field experience, most people just want to get systems running so they can still conduct business. If things are “slower” that’s acceptable. Let’s not forget the human factor involved as well. What I mean is frankly you may not be one of the people to even work on the failover. You could be trapped at home, or part of the disaster….I’m just saying.
vCloud Air-DR Allocation Model
The 1.0 release of vCloud Air Disaster Recovery is using the Allocation Pool Model in vCloud Director. This really does not mean anything to you as a user, except how a couple of key settings are configured. What I mean by this is how the vSphere resources are configured based on the vCD settings. If you want to know what these mean you can reference the previous link or read up on vCloud Director Allocation Models in general. For this service the compute is allocated in the following manner.
- 10Ghz CPU
- 20GB Memory
Calculating Compute Resources
The way I have described to people the best order to size the vCloud Air Disaster Recovery offering in my opinion is in the following order.
- Storage
- Memory
- CPU
For the purposes of keeping it simple I am leaving off bandwidth so we can focus on getting the machines running post event declaration. Storage is the easy start because you need to make sure you simply have enough storage to host all the placeholders on disk initially. That’s easy math to figure out based on your machine configurations.
Storage NOTE: Be sure to account for VSWAP storage using your machines memory configuration. Purchase enough to host them powered off AND powered on when VSWAP is created. If you are too tight on size you might miss when you go to power them on.
Memory
Like the standard Virtual Private Cloud offering this becomes extremely easy. Basically you can run whatever machines will fit in the memory you have purchased. So if you got a basic SKU with 20GB or RAM, divide that by the configured memory of the machines you want to failover. If the total configured is more than 20GB, you will need to purchase an add-on. You simply get the right number of add-on’s to cover the configured memory of the machines you are replicating.
CPU
CPU to me is always last. Why? Well because most applications are not CPU constrained. Also think about how you add capacity in vCloud Air. When you add 20GB of Memory you also add 10GHz more CPU. So if you are focussing on memory first, your CPU capacity will naturally increase. You should in most cases have plenty of CPU but there may be exceptions out there.
It’s not Rocket Science, But…
The bottom line is you don’t need to be a rocket scientist to figure this out. You just need a decent understanding of how things work both in vCloud Air and vSphere. You also just need a bit of common sense sprinkled in to make sure your on the right track. All of this being said there is a VMware Fling for sizing that may help with the vSphere Replication component, but as always flings have limited support and that appliance is really only to help with the replication side of things. It will not help you calculate the compute and memory aspect of how much capacity you need to support your machines for a disaster event.
Lastly always remember why you are doing Disaster Recovery and set your expectations of performance vs availability and be sure to document which you are targeting with your plans. I’d argue most people are more interested in having applications back online even if they are running a bit slower, again, with some exceptions.