- shared storage
- Common networks
- Ideally similar (or identical) hardware for each host
A good way to check that all hosts have access to the same networks and datastores is to use the ‘Maps’ feature. Select your cluster then deselect every option except ‘Host to Network’ or ‘Host to Datastore;
As you can see in this diagram the ’15 VLAN’ portgroup is not presented to every host (it’s slightly removed from the circle) and at least one VM in the cluster has a network assigned (in the top right) which isn’t available in this cluster at all.
Clusters consist of up to 32 hosts. The first five hosts in a cluster will be primaries, the rest secondaries. You can’t set a host to primary or secondary using the VI client, but you can using the AAM CLI (not supported, see how in this Yellow bricks article). One of the primaries will be the ‘active primary’ which collates resource information and places VMs after a failover event.
Heartbeat options and dependencies
Heartbeats are used to determine whether a host is still operational
Heartbeats use the service console networks by default, or the management network for ESXi hosts.
They’re sent every second by default. Can be amended using das.failuredetectioninterval
Primaries send heartbeats to both other primaries and secondaries, secondaries only send to primaries.
After no heartbeats have been received for 13 seconds the host will ping its isolation address.
HA operates even when vCentre is down (the AAM agent talks directly from host to host), although vCentre is required when first enabling HA on a cluster.
Diagnosing issues with heartbeats – see VMware KB1010991
- Primary/secondary distribution
- No more than four blades per chassis
- At least one primary must be online to join new hosts to cluster
- Can be configured with aamCLI, (but these settings are not persistent across reboots and not supported)
- Use Get-HAPrimaryVMHost PowerCLI cmdlet (vSphere v4.1 onwards). Example
- Interactions between HA and DRS
- Resource defragmentation (v4.1 onwards)
- Restart VM on one host, then DRS kicks in and load balances. Priority is to restart the VM.
- Large vs small cluster size – a larger cluster reduces the overhead of N + 1 architecture, but consider other factors such as LUN paths per host (only 255 LUNs per host, 64 NFS datastores). See this great post at Scott Drummonds Pivot Point blog and Duncan Epping’s followup post.
- Enough capacity? Look at performance stats in vCentre for the running workloads.
- DRS host affinity rules may be useful depending on storage implementation. You can pin VMs to specific hosts if the storage is not 100% shared (see VMworld session BC7803 for details)
- Design networking to be resilient (dual pNICs for Service Console for example)
- Avoid using ‘must’ host-affinity rules (introduced in vSphere 4.1) where possible as it limits the ability of HA to recover VMs.
Admission Control is a mechanism to ensure VM’s get the resources they require, even when a host (or hosts) in a cluster fails. Admission control is ON by default.
Three admission policies;
- No. of host failures to tolerate (default)
- Generally conservative
- Uses slots (can be customised). Reservations can cause sizing issues.
- % of resources
- More flexible when VMs have varying resource requirements
- Resource fragmentation can be an issue
- Dedicated failover host
- Simple – what you see if what you get
- Wastes capacity – specified host is not used during normal operations.
- Sometimes dictated by organisational policies
Both DRS and DPM respect the chosen admission control policy. This means hosts would not be put into suspend mode if the failover level would be violated for example. See VMware KB1007006 for details.
Analyse a cluster to determine appropriate admission control policy
Factors to consider;
- Required failover capacity vs available failover capacity. Dedicated Failover host only allows one host maximum for example.
- Similarity of hosts (percentage of resources policy better for disparate h/w)
- Similarity of VMs (one oversized VM can affect slot sizing but also percentage of resources)
Analyse slot sizing (inc. custom sizes)
- Memory = smallest reservation + memory overhead for VM. Override using das.slotMEMinMB. Set a minimum using das.vmMemoryMinMB.
- CPU = smallest reservation or 256MHz (whichever is smaller). Override using das.slotCPUinMHZ. Set a minimum using das.vmCPUMinMHZ.
- Current slot size is shown in ‘Advanced Runtime Info’ for cluster
NOTE: This only shows the total slots in the cluster rather than slots per host. If a particular host has more memory or CPU compared to the other host it will have a higher number of slots.
· The ‘available slots’ figure shown in the ‘Advanced Runtime Info’ tab will be equal to (total slots – used slots) – slots reserved for failover (which isn’t shown in the dialog). This is why ‘used slots’ and ‘available slots’ doesn’t add up to ‘total slots’.
· The total number of slots will take into account virtualisation overhead. For example in a cluster with 240GB RAM total only 210GB may be available to VMs (the rest being used for the vmKernel, service console (on ESX) and device drivers etc. If slot size is 2.2GB RAM there will be roughly 95 slots total. See this VMware communities thread for more info on virtualisation overhead.
To calculate failover capacity
- Decide how many host failures you want to cope with
- Calculate the number of slots for each host in the cluster and therefore the total slots available. If all hosts are identical (CPU, mem) then simply divide the total number of slots by the number of hosts.(see Advanced Runtime Info)
- Subtract the largest x hosts from the number of slots (where x is the number of failures to tolerate) This will give the number of slots that HA will keep reserved.
NOTE: Using ‘No. of host failures’ often leads to a conservative consolidation ratio.
Percentage of resources gotcha – if you set to 50% but you have more than 10 hosts in your cluster you can run into problems. In theory you can still reserve enough capacity but you can’t guarantee that a primary node will still be working.
- Heartbeat pings every second (das.failuredetectioninterval = 1000)
- 15 second timeout (das.failuredetectiontimeout=15000)
- Increase to 20 seconds (20000) for 2nd service console or second isolation address
- Increase to 60 seconds (60000) is portfast is not set (to allow time for spanning tree
- Advanced settings
- das.isolationnetworkx – used to define multiple isolation networks
- das.usevMotionNIC?? – used with ESXi (which has no service console)
NOTE: There is a small chance that HA could shutdown VMs and not restart them on another host. This only occurs when the isolated host returns to the network between the 14th and 15th second. In this case the isolation response is triggered by the restart isn’t because by then the host is no longer considered failed (VMware KB2956923)
Default settings for isolation response;
|ESX 3.5(u3 through u5)||PowerOn|
Restart interval after a failover
- 2, 6, 14, 22, 30mins
- Hosts may be in standby mode (when using DPM) so could take several mins for the host to power-up and be ready to host VMs
- VM restart count (default 5) – das.maxvmrestartcount Split brain
- Occurs when both management network and storage fail (more likely with NFS, iSCSI or FCoE)
- VM is restarted on another host but continues to run in memory on the isolated host. When that host rejoins the network the VM is running simultaneously on two hosts. Bad!
- vSphere 4.0U2 solves this. For prior versions either avoid ‘Leave powered on’ as an isolation response or manually close processes on isolated ESX host before rejoining.
Not in the blueprint, but useful to know.
Not in the blueprint, but useful to know.
You can monitor clusters using the following vCentre alarms;
- the usual host alerts – host failed , thermal, memory usage over threshold etc
- cluster high availability error – a specific error which you can set actions for
If you’re doing network maintenance, put the cluster in maintenance mode (not the hosts) to avoid the isolation response being triggered.
Tools & learning resources
- vSphere Availability Guide
- VMware HA: Deployment Best Practices
- Duncan Epping’s HA Deepdive
- Sample chapter from Duncan Epping and Frank Denneman’s HA/DRS book (the book’s now out!)
- Session BC7803 (VMworld 2010, sign-in required)
- VMware KB1006421 – Advanced configuration options for VMware HA
- VMware KB1001596 – Troubleshooting HA