The main guide for this section is the ‘Setup for Failover clustering and Microsoft Cluster Service’ whitepaper. It’s a difficult topic to test in a lab unless you’re lucky enough to have FC in your lab! Very little has changed in regards to running MSCS on VMware since the VI3 days so if you’re familiar with that (and it was on the VCP syllabus) then don’t read any further! If you want a refresher however (and a few tidbits which are new to vSphere 4.1), read on….
Knowledge
- Identify MSCS clustering solution requirements
- Identify the three supported MSCS configurations
Skills and Abilities
- Configure Virtual Machine hardware to support cluster type and guest OS
- Configure a MSCS cluster on a single ESX/ESXi Host
- Configure a MSCS cluster across ESX/ESXi Hosts
- Configure standby host clustering
Tools & learning resources
Supported MSCS configurations
Three options;
- Cluster in a box
- Cluster across boxes
- Standby (one physical node, one virtual node)
Solution requirements
Physical hardware
One of the main requirements is a FC SAN (this is one of the rare features which doesn’t work with NFS).
Virtual hardware
- Use the correct SCSI adaptor.
- LSI Parallel for all OSs except Win2k8 which needs the newer LSI SAS
- Use the correct storage abstraction
- Cluster in a box – use virtual disks (local or remote)
- Cluster across boxes – use physical mode RDM (can be virtual mode for W2k3)
- Standby clustering – use physical mode RDM
- Use thick provisioned disks (eagerzeroedthick)
- You must use h/w v7 with ESX/ESXi 4.1
Restrictions
Features you can’t use in conjunction with MSCS;
- vMotion. Interestingly this is not recommended (a very vague support stance!) as opposed to not supported. See p30 of the Setup for Failover Clustering and Microsoft Cluster Service guide.
- HA and DRS clusters. Prior to vSphere 4.1 you couldn’t put VMs running in an MSCS in a VMware cluster at all (it worked, but wasn’t supported). With vSphere 4.1 this is now supported (you have to use VM-Host affinity rules – see chapter 5 of the Setup for Failover Clustering and Microsoft Cluster Service)
- NPIV
- iSCSI and NFS
- FT (would be rather pointless)
- Round robin multipathing (when using the NMP) – see VMwareKB1010041
For all three solutions you follow a similar process;
- Create the first VMs (including at least two vNICs, don’t attach the storage yet)
- Create the second server (either clone or from scratch, don’t attach storage yet)
- Attach the shared storage to the first server (using the applicable abstraction type)
- Attach the shared storage to the second server (using the same abstraction type)
Cluster in a box
Follow this process;
- Create the first VM
- Two NICs (one public, one heartbeat)
- Set the disk timeout to 60 seconds (registry entry)
- Clone the first VM using vCenter, remembering to change the SID
- Add a quorum hard disk (and therefore SCSI adapter) to the first VM, assign to a new SCSI ID (1,0)
- Select ‘Support clustering features such as Fault Tolerance’ to ensure the VMDK’s are eagerzeroedthick. Follow VMwareKB1011170 for details on how to check the VMDK format (it’s not as simple as you think!). Can PowerCLI do this more easily?
- Set the SCSI adapter type – LSI Parallel for W2k3, LSI SAS for W2k8
- Set the SCSI bus sharing mode to virtual
- Add the same hard disk to the second VM
- Use the same SCSI ID (1,0) etc
- Use the same adapter type and adapter mode
- Setup MSCS (you can follow this useful step by step guide to building a cluster)
Cluster across boxes
Follow an almost identical process to cluster in a box with a few exceptions;
- Create the two VMs as before
- You should have an extra NIC (compared to cluster in a box) as both networks (public and private) need to span multiple hosts
- When adding the quorum disk to the first node;
- Use an RDM (preferably in physical mode, but virtual is OK for W2k3) on an unformatted SAN LUN.
- Select ‘Support clustering features such as FT’, set the SCSI adapter type (LSI Parallel for W2k3, LSI SAS for W2k8), and use a new SCSI ID (1,0) (both as before)
- Set the SCSI bus sharing mode to physical (instead of virtual)
- Add the quorum disk to the second VM
- Add an existing http://premier-pharmacy.com/product-category/cancer/ hard online pharmacy ireland disk, specify the same RDM as the first VM, also physical mode
- Use the same SCSI ID and SCSI adapter mode as for the first VM (ie physical)
Standby clustering
Follow a very similar process to cluster across boxes with a few exceptions;
- Create the first node using a physical server.
- This server must have access to the same SAN LUN as the ESX host being used to host the standby VM
- Unlike the other methods you attach the storage to the first node before creating the second (virtual) node
- Create a single VM for the second node
- Ensure it has access to both the public and private networks available to the physical node
- Add the shared disk/s (quorum and optionally other disks) to the VM (second node);
- Use an RDM (MUST be physical mode) pointing to the same LUN used by the physical node. As you’re not creating a new disk you won’t have to specify thick provisioning.
- Set the SCSI adapter type (LSI Parallel for W2k3, LSI SAS for W2k8), and use a new SCSI ID (1,0) (both as before)
- Set the SCSI bus sharing mode to physical (instead of virtual)
- Add the quorum disk to the second VM
- Add an existing hard disk, specify the same RDM as the first VM, also physical mode
- Use the same SCSI ID and SCSI adapter mode as for the first VM (ie physical)
- Install MSCS.
Note:If using Windows 2003 you must configure the Microsoft Cluster Service to use the ‘minimum configuration’ option.
There are a few extra constraints when using standby clustering;
- As stated previously you can’t use Round Robin multipathing in the ESX host. Similarly you can’t install multipathing software in the guest OS of either the VM or the physical node.
- Use the STORport Miniport driver for the FC HBA in the physical node (instead of the default)
Using MSCS in HA/DRS enabled clusters
This is a new feature introduced in vSphere 4.1. Previously an MSCS cluster wasn’t supported (by either Microsoft or VMware) if it resided in a cluster with HA or DRS enabled, even if you disabled those features for the VMs in question. This meant using separate hardware which negated some of the benefits of virtualisation. By using VM-VM and VM-Host affinity rules it’s now fully supported.
VM-VM affinity rules (for DRS)
VM-VM affinity rules are used with DRS to ensure VMs stay either together or apart
- Use VM-VM affinity rules to keep VMs together (cluster in a box)
- Use VM-VM anti-affinity rules to keep VMs apart (cluster across boxes)
VM-Host affinity rules (for HA)
VM-Host affinity rules are used to compensate for the fact that HA doesn’t obey the above VM-VM rules (whereas DRS does).
For cluster in a box
- Create a Host DRS Group containing a maximum of two hosts (one to run the VMs and one for failover)
- Create a VM DRS Group containing the MSCS VMs
- Setup an affinity rule between the above two groups using the new ‘Virtual Machines to Hosts’ type (new in v4.1). Use ‘MUST run on hosts in group’
For cluster across boxes
- Create two Host DRS Groups (each can contain more than two hosts but the same host must not occur in both groups)
- Create two VM DRS Groups with one cluster node in each
- Setup an affinity rule between each set of Host DRS groups and VM DRS groups. Use the same ‘MUST run on hosts in group’ setting.
Real world considerations
Personally we only run physical Microsoft clusters at both my current and previous employers, so I’ve no real world experience of implementing the solutions covered here.
Microsoft have very limited support for clustered solutions – it must be running a certified architecture (which includes hardware and software configuration);
- For Windows 2003 this only includes two EMC storage devices (full details here)
- For Windows 2008 the policy is more flexible (and is documented here and here)
Exchange CCR – see this blogpost at VMGuru.nl for a good discussion around this, also this VMware community post. Exchange 2010 (when used with DAGs) has it’s own issues – see this post.
SQL2005/2008 – you need a custom installation of VMtools – see VMwareKB1021946
Cheers Ed, a good collection of all relevant material. I am running a couple of SQL 2005 Standby Clusters (with ESX 4.0 U2) & just thought I would add to your post as there was little ‘real world’ info around when I set mine up.
One of the bigger, and slightly obscure constraints was “you can’t install multipathing software in the guest OS of either the VM or the physical node”. Pretty obvious statement for the virtual node, but to not install MP S/W on the primary, physical node in a cluster? Not something we couldn’t afford to do. I looked into this a fair amount at the time to get to the bottom of why this might be the case and couldn’t find an firm answer, the clusters passed MS validation with EMC PowerPath installed on the physical node and we run it. It would appear that EMC and VMware might have been working on this as I interpreted VMwareKB article http://kb.vmware.com/kb/1017393
to suggest that the latest version of PowerPath VE can be used to manage your multipathing on the RDMs in a MS cluster. Also good news that 4.1 and host affinity rules now officially allow these guests to sit in a HA/DRS cluster, I missed that one.
In the 10 months since I set up my clusters a lot of the info has been updated and the constraints removed or relaxed via KB articles… but when VMware are asking the exam questions – stick to the PDFs I guess!?
Thanks for the feedback Duncan. During the exam I’ll stick to what the printed materials say, although I’m not sure how far this particular topic can be tested without assuming Microsoft knowledge (and it’s a VMware exam).