VCAP-DCA Study notes – 3.1 Tune and Optimize vSphere Performance

It’s hard to know what to cover in this objective as performance tuning often implies troubleshooting (note the recommended reading of Performance Troubleshooting!) hence there’s a significant overlap with the troubleshooting section. Luckily there are plenty of excellent resources in the blogosphere and from VMware so it’s just a case of reading and practicing.

Knowledge

  • Identify appropriate BIOS and firmware setting requirements for optimal ESX/ESXi Host performance
  • Identify appropriate ESX driver revisions required for optimal ESX/ESXi Host performance
  • Recall where to locate information resources to verify compliance with VMware and third party vendor best practices

Skills and Abilities

  • Tune ESX/ESXi Host and Virtual Machine memory configurations
  • Tune ESX/ESXi Host and Virtual Machine networking configurations
  • Tune ESX/ESXi Host and Virtual Machine CPU configurations
  • Tune ESX/ESXi Host and Virtual Machine storage configurations
  • Configure and apply advanced ESX/ESXi Host attributes
  • Configure and apply advanced Virtual Machine attributes
  • Tune and optimize NUMA controls

Tools & learning resources

Identify BIOS and firmware settings for optimal performance

This will vary for each vendor but typical things to check;

  • Power saving for the CPU.
  • Hyperthreading – should be enabled
  • Hardware virtualisation (Intel VT, EPT etc) – required for EVC, Fault Tolerance etc
    NOTE: You should also enable the ‘No Execute’ memory protection bit.
  • NUMA settings (node interleaving for DL385 for instance. Normally disabled – check Frank Denneman’s post.
  • WOL for NIC cards (used with DPM)

Identify appropriate ESX driver revisions required for optimal host performance

I guess they mean the HCL. Let’s hope you don’t need an encyclopaedic knowledge of driver version histories!

Tune ESX/i host and VM memory configurations

Read this great series of blog posts from Arnim Van Lieshout on memory management – part one, two and three. And as always the Frank Denneman post.

Check your Service Console memory usage using esxtop.

Hardware assisted memory virtualisation

Check this is enabled (per VM). Edit Settings -> Options -> CPU/MMU Virtualisation;

image
Enabling h/w CPU/memory assist for a VM

NOTE: VMware strongly recommend you use large pages in conjunction with hardware assisted memory virtualisation. See section 3.2 for details on enabling large memory pages. However enabling large memory pages will negate the efficiency of TPS so you gain performance at the cost of higher memory usage. Pick your poison…(and read this interesting thread on the VMware forums)

Preference for memory overcommit  storage performance (most effective at the top);

  1. Transparent page sharing (negligible performance impact)
  2. Ballooning
  3. Memory compression
  4. VMkernel swap files (significant performance impact)

Transparent Page Sharing (TPS) – otherwise known as memory dedupe!

  • Enabled by default
  • Refreshed periodically
  • Can be disabled;
    • Disable per ESX – add Mem.ShareScanGHz = 0 in Advanced Settings (VMwareKB1004901)
    • Disable per VM – add sched.mem.pshare.enable = FALSE in .VMX (entry not present by default).
    • Efficiency is impacted If you enable large memory pages (see this discussion)

Balloon driver

  • Uses a guest OS driver (vmmemctl) which is installed with VMware Tools (all supported OSs)
  • Guest OS must have enough swapfile configured for balloon driver to work effectively
  • Default max for balloon driver to reclaim is 65%. Can be tuned using sched.mem.maxmemctl in .VMX (entry not present by default). Read this blogpost before considering disabling!
  • Ballooning is normal when overcommitting memory and may impact performance

Swapfiles

  • VMware swapfiles
    • Stored (by default) in same datastore as VM (as a .vswp file). Size = configured memory – memory reservation.
    • Include in storage capacity sizing
    • Can be configured to use local datastore but that can impact vMotion performance. Configured at either cluster/host level or override per VM (Edit Settings -> Options – Swapfile location)
    • Will almost certainly impact performance
  • Guest OS swapfiles
    • Should be configured for worst case (VM pages all memory to guest swapfile) when ballooning is used

NOTE: While both are classified as ‘memory optimisations’ they both impact storage capacity.

Memory compression

Memory compression is a new feature to vSphere 4.1 (which isn’t covered in the lab yet) so I won’t cover it here.

Monitoring memory optimisations

TPS

  • esxtop;
    • PSHARE/MB – check ‘shared’, ‘common’ and ‘savings’ (memory overcommit)
    • Overcommit % shown on the top line of the memory view (press m).  0.19 = 19%.
    • NOTE: On Xeon 5500 (Nehalem) hosts TPS won’t show much benefit until you overcommit memory (VMwareKB1021095)
  • vCenter performance charts (under ‘Memory’);
    • ‘Memory shared’. For VMs and hosts, collection level 2.
    • ‘Memory common’. For hosts only, collection level 2

Ballooning

  • esxtop
    • MEMCTL/MB – check current, target.
      MCTL? to see if driver is active (press ‘f’ then ‘i’ to add memctl columns)
  • vCenter performance charts (under ‘Memory’);;
    • ‘memory balloon’. For hosts and VMs, collection level 1.
    • ‘memory balloon target’. For VMs only, collection level 2.

Swapfiles

  • esxtop
    • SWAP/MB – check current, r/s, w/s.
    • SWCUR to see current swap in MB (press ‘f’ then ‘j’ to add swap columns)
  • vCenter performance charts (under ‘Memory’);
    • ‘Memory Swap Used’ (hosts) or ‘Swapped’ (VMs). Collection level 2.
    • ‘Swap in rate’, ‘Swap out rate’. For hosts and VMs, collection level 1.

NOTE: Remember you can tailor statistics levels – vCenter Server Settings -> Statistics. Default is all level one metrics kept for one year.

Read Duncan Eppings blogpost for some interesting points on using esxtop to monitor ballooning and swapping. See Troubleshooting section 6.46.2 for more information on CPU/memory performance.

Tune ESX/ESXi Host and Virtual Machine networking configurations

Things to consider;

  • Check you’re using the latest NIC driver both for the ESX host and the guest OS (VMTools installed and VMXNET3 driver where possible)
  • Check NIC teaming is correctly configured
  • Check physical NIC properties – speed and duplex are correct, enable TOE if possible
  • Add physical NICs to increase bandwidth
  • Enable Netqueue – see section 2.1.3
  • Consider DirectPath I/O – see section 1.1.4
  • Consider use of jumbo frames (though some studies show little performance improvement)

Monitoring network optimisations

esxtop (press ‘n’ to get network statistics);

  • %DRPTX – should be 0
  • %DRPRX – should be 0
  • You can also see which VM is using which pNIC in a team (assuming it’s using virtual port ID load balancing), pNIC speed and duplex

vCenter (Performance -> Advanced -> ‘Network’);

  • Network usage average (KB/s). VMs and hosts, collection level 2.
  • Dropped rx – should be 0, collection level 2
  • Dropped tx – should be 0, collection level 2

See Troubleshooting section 6.3 for more information on networking performance.

Tune ESX/ESXi Host and Virtual Machine CPU configurations

Hyperthreading

  • Enable hyperthreading in the BIOS (it’s enabled by default in ESX)
  • Set hyperthreading sharing options on a per VM basis (Edit Settings -> Options). Default is to allow sharing with other VMs and shouldn’t be changed unless specific conditions require it (cache thrashing).
  • Can’t enable with more than 32 cores (ESX has a 64 logical CPU limit)

CPU affinity

  • Avoid where possible – impacts DRS, vMotion, NUMA, CPU scheduler efficiency
  • Consider hyperthreading – don’t set two VMs to use CPU 0 & 1 as that http://premier-pharmacy.com/product/lamictal/ might be a single hyperthreaded core
  • Use cases – licencing, copy protection

CPU power management (vSphere v4.1 only)

  • Enabled in BIOS and ESX
  • Four levels;
    • High performance (default) – no power management features evoked unless triggered by thermal or power capping events
    • Balanced
    • Low power
    • Custom

NOTE: VMware recommend disabling CPU power management in the BIOS if performance concerns outweigh power saving.

Monitoring CPU optimisations

esxtop (press ‘c’ to get CPU statistics);

  • CPU load average (top line) – for example 0.19 = 19%.
  • %PCPU – should not be 100%! If one PCPU is constantly higher than other check for VM CPU affinity
  • %RDY – should be below 10%
  • %MLMTD – should be zero. If not check for VM CPU limits.
  • You can also use ‘e’ to expand a specific VM and see the load on each vCPU. Good to check if vSMP is working effectively.

vCenter (Performance -> Advanced ->  ‘CPU’);

See Troubleshooting section 6.2 for more information on CPU/memory performance.

Tune ESX/ESXi Host and Virtual Machine storage configurations

In reality there’s not that much tuning you can do at the VMware level to improve storage, most tuning needs to be done at the storage array (reiterated in the ESXTOP Statistics guide).So what can you tune? Watch VMworld 2010 session TA8065 (subscription required).

Multipathing – select the right policy for your array (check with your vendor);

  • MRU (active passive)
  • Fixed (active/active)
  • Fixed_AP (active/passive and ALUA)
  • RR (active/active, typically with ALUA)

Check multipathing configuration using esxcli and vicfg-mpath. For iSCSI check the software port binding.

Storage alignment

You should always align storage at array, VMFS, and guest OS level.

Storage related queues

Use esxcfg-module to amend LUN (HBA) queue depth (default 32). Syntax varies per vendor.

Use esxcfg-advcfg to amend VMkernel queue depth. Should be the same as the LUN queue depth.

NOTE: If you adjust the LUN queue you have to adjust on every host in a cluster (it’s a per host setting)

Using vscsiStats

See section 3.5 for details of using vscsiStats.

NOTE: Prior to vSphere 4.1 (which includes NFS latency in both vCenter charts and esxtop) vscsiStats was the only VMware tool to see NFS performance issues. Use array based tools!

Monitoring storage optimisations

esxtop (press ‘d’,’u’, or ‘v’ to get storage metrics for HBA, LUN and per VM respectively);

  • KAVG/cmd should be less than 2 (delay while kernel empties storage queue)
  • DAVG/cmd should be under 15-20ms (approx)
  • ABRTS/s should be zero (this equates to guest OS SCSI timeouts)
  • CONS/s should be zero (SCSI reservation conflicts. May indicate too many VMs in a LUN). v4.1 only.

vCenter (Performance -> Advanced -> Disk (or Datastore). Only available in vSphere 4.1)

  • Read latency – collection level 2
  • Write latency – collection level 2
  • Disk command aborts – if greater than 1 indicates overloaded storage. V4.1 only.

Generic tips for optimising storage performance

  • Check IOps
  • Check latency
  • Check bandwidth
  • Remember for iSCSI and NAS you may also have to check network performance

See Troubleshooting section 6.4 for more information on storage performance.

Configure and apply advanced ESX/ESXi Host attributes

These can be configured via Configuration -> Advanced Settings. Things you’ll have used this for;

  • Checking if Netqueue is enabled/disabled (vmKernel -> Boot)
  • Updating your NFS settings to apply Netapp recommendations (if you use Netapp storage)
  • Allowing snapshots on a virtual ESX host in your lab (unsupported but very useful!)
  • Disabling transparent page sharing
  • Setting preferred AD controllers (when using AD integration in vSphere 4.1)
image
Setting advanced NFS host parameters

Configure and apply advanced Virtual Machine attributes

These are configured on a per VM basis via Edit Settings -> Options -> General -> Configuration Parameters. Things you’ll use this at VMware support’s recommendation;

image
Setting advanced VM attributes

Tune and optimize NUMA controls

Non Uniform Memory access (NUMA) is a technology designed to optimise motherboard design. Rather than provide a single pool of physical memory to the various CPUs each CPU is given a set of ‘local’ memory which is can access very quickly. The disadvantage is that not all memory is instantly accessible to all CPUs. Read VMware vSphere™ : The CPU Scheduler in VMware® ESX™ 4.1 for more info.

If you want to understand NUMA, you need to check out Frank Denneman’s site.  As of March 2011 he’s got 11 in-depth articles about NUMA.

Practical implications for VCAP-DCA exam?

  • Configured per VM. Go to Edit Settings -> Resources tab -> Advanced CPU.
  • Can have a performance impact if not balancing properly
  • Can be monitored using esxtop (Frank Denneman’s post shows how)
  • Setting CPU affinity breaks NUMA optimisations

Configure CPU and memory NUMA affinity;

image
Configuring CPU NUMA settings
image
Configuring memory NUMA settings

Monitoring performance impact of NUMA

Using esxtop go to the memory view (m). The first figure is the total memory per NUMA node (approx. 20GB in the screenshot below) and the figure in brackets is the memory free per node. To get more NUMA related statistics press ‘f’ (to select fields to add) and then ‘g’ for NUMA statistics.

< ![endif]–>

Monitoring performance impact of NUMA

Using esxtop go to the memory view (m). The first figure is the total memory per NUMA node (approx. 20GB in the screenshot below) and the figure in brackets is the memory free per node. To get more NUMA related statistics press ‘f’ (to select fields to add) and then ‘g’ for NUMA statistics.

DIAGRAM

As you can see this server in this example is very imbalanced which could point to performance issues (it’s a 4.0u1 server so no ‘wide NUMA’). Looking further you can see that zhc1unodb01 only has 54% memory locality which is not good (Duncan Epping suggests under 80% is worth worrying about). Other people have seen similar situations and VMwareKB1026063 is close but doesn’t perfectly match my symptoms. vSphere 4.1 has improvements using ‘wide’ NUMA support – maybe that’ll help…

o Should be configured for worst case (VM pages all memory to guest swapfile) when ballooning is used

image
Using ESXTOP to monitor NUMA

As you can see this server in this example is very imbalanced which could point to performance issues (it’s a 4.0u1 server so no ‘wide NUMA’). Looking further you can see that zhc1unodb01 only has 54% memory locality which is not good (Duncan Epping suggests under 80% is worth worrying about). Other people have seen similar situations and VMwareKB1026063 is close but doesn’t perfectly match my symptoms. vSphere 4.1 has improvements using ‘wide’ NUMA support – maybe that’ll help…

One thought on “VCAP-DCA Study notes – 3.1 Tune and Optimize vSphere Performance

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.