The main document to work through for the VCAP-DCA is the Availability Guide but there are plenty of good white papers and blog posts which give useful background information (see the bottom of this post). If you have access to the 2010 VMworld content it’s worth watching session BC8274 which covers most of the material on the blueprint.
- Identify VMware FT hardware requirements
- Identify VMware FT compatibility requirements
Skills and Abilities
- Modify VM and ESX/ESXi Host settings to allow for FT compatibility
- Use VMware best practices to prepare a vSphere environment for FT
- Configure FT logging
- Prepare the infrastructure for FT compliance
- Test FT failover, secondary restart and application fault tolerance in a FT Virtual Machine
FT requirements (hardware, software and feature compatibility)
- Firstly you have to make sure your host hardware will support FT – it’s more demanding than many other VMware features.
- The main requirement is to have Intel Lockstep technology support in the CPUs and chipset. Rather than list the processor families which support FT you can read VMwareKB1008027.
- Hardware virtualisation must also be enabled in the BIOS (not always on by default).
- You need to ensure the guest OS and CPU combination is supported (as the Availability Guide states, Solaris on AMD is not for example).
- Must have HA enabled on the cluster
- Licencing– you need Advanced or higher to run FT
- Host certificates need to be enabled. If you did a clean install of vSphere 4.x this is enabled by default but if you upgraded from VI3.x you have to explicitly enable it (vCentre settings, SSL)
- Should avoid mixing ESX and ESXi hosts in a cluster with FT-enabled VMs (VMwareKB1013637)
There are also VM level requirements;
- No USB or sound devices
- No NPIV
- No paravirtualized guest OS
- No physical mode RDMs
- Hot plug (memory, CPU, hard disks etc) is automatically disabled for FT-enabled VMs
- No Serial or parallel ports
FT places quite a few restrictions on the features you can use;
- No SMP
- No snapshots – this also means no VCB or any other backup technology which relies on an underlying snapshot. VMwareKB1016619 describes how you can do backups using templates or storage array level snapshots. Both seem pretty awkward and far from ideal.
- No storage vMotion
- No thin provisioned disks
So what features are supported?
- HA, DRS and vMotion (DRS with FT support is new with vSphere v4.1, vMotion always worked)
NOTE: DRS is only available if EVC is enabled. If you disable EVC there is a warning that features such as FT with DRS will not work.
Preparing the infrastructure for FT compliance
The main requirement is for an additional GB capable NIC for FT logging.
- Configuring networking for FT
- Check compliance with FT
- Enable FT per VM
Just like enabling vMotion, it’s recommended (but not enforced) to have a dedicated NIC (preferably on a separate vSwitch and subnet) where you enable Fault Tolerance logging (on a VMkernel port);
Sharing vMotion and FT logging (as in the screenshot) might be OK in a lab environment or on 10GB NICs.
So besides manually working through these extensive lists how can you check a host for FT compatibility? Various ways (only the first two are ‘native’ tools so these will probably be your only choice during the VCAP-DCA exam);
- Run a cluster compliance check (from the Profile Compliance tab of a cluster). The ‘Description…’ field will detail any issues;
- Slightly less comprehensive is the host’s summary screen;
- Boot the server in question using the CPUInfo CD (provided as an .ISO by VMware)
- Install and run the SiteSurvey plugin to vCenter. This simply integrates as a plugin to the VI client and illustrates any issues;
Enabling FT per VM
This is simply done from the context menu for the VM. If the prerequisites aren’t satisfied the option will be greyed out. If the VMDK files are thin provisioned they will be converted to eager-thick-zeroed when you enable FT on the VM. For large VMs this could take a considerable time.
Determining if a VM can be FT-enabled using vim-cmd– VMwareKB1026509
You can see affinity rules which keep primary and secondary on separate hosts.
There are times when you need to disable FT, often due to its restrictions (for instance you can’t have FT running while you patch the underlying host, or when you need to svMotion an FT-enabled VM). Two options;
- Disable – the secondary remains but isn’t up to date. Use when temporarily disabling FT
- Turn off – the secondary is deleted. Use when the change is permanent. VMwareKB1008026 details these options.
You can enable a CPU reservation (which is then applied to both primary and secondary) if you’re concerned about contention with other VMs (an FT-enabled VM will automatically have a full memory reservation).
Remember that FT-enabling a VM adds a second running VM, hence extra resources are being used.
Powering off the primary VM will also power off the secondary VM
There are several default alarms you can use with FT(VMwareKB1025755);
- VM Fault Tolerance state changed
- VM FT vLockstep interval status changed
- No compatible host for Secondary VM
- Secondary VM log latency exceeded
Most of the best practices can be summarised as ‘maintain consistency’. This applies at various levels;
- The hosts in the cluster should be of similar spec and performance (otherwise a secondary VM may not keep up with a primary VM or vice versa). This also enhances chances of compatibility. This is a best practice for VMware clusters in general but doubly so for FT.
- The FT-enabled VMs should be spread between the available hosts to avoid overloading either logging NICs or host CPU. By default you can’t run more than four FT-enabled VMs on a given host anyway.
- Use identical power management and hyperthreading features on hosts used with FT. Typically disable power management.
- You can change the NIC teaming policy to better utilise multiple NICs for FT logging. See VMwareKB1011966 for details.
Testing and experimenting with FT is tricky if you home lab doesn’t support FT, and it is one of the harder features to get compliant hardware for. Luckily several people have posted great articles about low cost hardware which is FT compliant – here, here, here
Some people have been successful running virtual FT ie running virtual ESX hosts and enabling FT on a nested VM, but I couldn’t get this to work in my lab, even though the physical host’s CPU was compatible.
- VMware KB1008026 – Disabling and turning off FT
- VMware KB1008027 – Processors and guest OS support for FT
- VMware KB1017714 – Disable FT compliance checks
- VMware white paper on FT Architecture and Performance
- VMware white paper on Creation and Design of FT (academic and technical)
- VMware forums for HA and FT (a good place to learn typical issues)
- VMworld 2010 session BC8274 – FT best practices and use cases (subscription only)
- Brian Atkinson’s blogpost
- Eric Siebert’s Master of FT blogpost
- Barry Combs FT blogpost (real world use cases)
- Hany Michael’s prize winning blogpost on FT (with an uber diagram!)