This is one objective where you definitely have to get hands on – there’s no way you’ll learn esxtop otherwise. Ideally you’ll have a real infrastructure to play with as you want hosts with memory contention, ballooning, swapping, NUMA optimisations etc so you can play with and understand the features.
Knowledge
Identify hot keys and fields used with resxtop/esxtop
Identify fields used with vscsiStats
Skills and Abilities
Configure esxtop/resxtop custom profiles
Determine use cases for and apply esxtop/resxtop Interactive, Batch and Replay modes
Use vscsiStats to gather storage performance data
Use esxtop/resxtop to collect performance data
Given esxtop/resxtop output, identify relative performance data for capacity planning purposes
Use the (new to vSphere) DRS Faults and DRS History tabs to investigate issues with DRS
By default DRS recalculates every 5 minutes (including DPM recommendations), but it also does so when resource settings are changed (reservations, adding/removing hosts etc).For a full list of actions which trigger DRS calculations see Frank Denneman’s HA/DRS book.
It’s perfectly possible to turn on DRS even though all prerequisite functionality isn’t enabled – for example if vMotion isn’t enabled you won’t be prompted (at least until you try to migrate a VM)!
Affinity and anti-affinity rules
There are two types of affinity/anti-affinity rules;
VM-VM (new in vSphere v4.0)
VM-Host (new to vSphere 4.1)
The VM-VM affinity is pretty straightforward. Simply select a group of two or more VMs and decide if they should be kept together (affinity) or apart (anti-affinity). Typical use cases;
Webservers acting in a web farm (set anti-affinity to keep them on separate hosts for redundancy)
A webserver and associated application server (set affinity to optimise networking by keeping them on the same host)
VM-Host affinity is a new feature (with vSphere 4.1) which lets you ‘pin’ one or more VMs to a particular host or group of hosts. Use cases I can think of;
Pin the vCenter server to a couple of known hosts in a large cluster
Pin VMs for licence compliance (think Oracle, although apparently they don’t recognise this new feature as being valid – see the comments in this post)
Microsoft clustering (see section 4.3 for more details on how to configure this)
Multi-tenancy (cloud infrastructures)
Blade environments (ensure VMs run on different chassis in case of backplane failure)
It’s hard to know what to cover in this objective as performance tuning often implies troubleshooting (note the recommended reading of Performance Troubleshooting!) hence there’s a significant overlap with the troubleshooting section. Luckily there are plenty of excellent resources in the blogosphere and from VMware so it’s just a case of reading and practicing.
Knowledge
Identify appropriate BIOS and firmware setting requirements for optimal ESX/ESXi Host performance
NUMA settings (node interleaving for DL385 for instance. Normally disabled – check Frank Denneman’s post.
WOL for NIC cards (used with DPM)
Identify appropriate ESX driver revisions required for optimal host performance
I guess they mean the HCL. Let’s hope you don’t need an encyclopaedic knowledge of driver version histories!
Tune ESX/i host and VM memory configurations
Read this great series of blog posts from Arnim Van Lieshout on memory management – part one, two and three. And as always the Frank Denneman post.
Check your Service Console memory usage using esxtop.
Hardware assisted memory virtualisation
Check this is enabled (per VM). Edit Settings -> Options -> CPU/MMU Virtualisation;
NOTE: VMware strongly recommend you use large pages in conjunction with hardware assisted memory virtualisation. See section 3.2 for details on enabling large memory pages. However enabling large memory pages will negate the efficiency of TPS so you gain performance at the cost of higher memory usage. Pick your poison…(and read this interesting thread on the VMware forums)
The PSA layout is well documented here, here. The PSA architecture is for block level protocols (FC and iSCSI) – it isn’t used for NFS.
Terminology;
MPP = one or more SATP + one or more PSP
NMP = native multipathing plugin
SATP = traffic cop
PSP = driver
There are four possible pathing policies;
MRU = Most Recently Used. Typically used with active/passive (low end) arrays.
Fixed = The path is fixed, with a ‘preferred path’. On failover the alternative paths are used, but when the original path is restored it again becomes the active path.
Fixed_AP = new to vSphere 4.1. This enhances the ‘Fixed’ pathing policy to make it applicable to active/passive arrays and ALUA capable arrays. If no user preferred path is set it will use its knowledge of optimised paths to set preferred paths.
RR = Round Robin
One way to think of ALUA is as a form of ‘auto negotiate’. The array communicates with the ESX host and lets it know the available path to use for each LUN, and in particular which is optimal. ALUA tends to be offered on midrange arrays which are typically asymmetric active/active rather than symmetric active/active (which tend to be even more expensive). Determining whether an array is ‘true’ active/active is not as simple as you might think! Read Frank Denneman’s excellent blogpost on the subject. Our Netapp 3000 series arrays are asymmetric active/active rather than ‘true’ active/active.