VCAP-DCA Study notes – 3.1 Tune and Optimize vSphere Performance

It’s hard to know what to cover in this objective as performance tuning often implies troubleshooting (note the recommended reading of Performance Troubleshooting!) hence there’s a significant overlap with the troubleshooting section. Luckily there are plenty of excellent resources in the blogosphere and from VMware so it’s just a case of reading and practicing.

Knowledge

Identify appropriate BIOS and firmware setting requirements for optimal ESX/ESXi Host performance
Identify appropriate ESX driver revisions required for optimal ESX/ESXi Host performance
Recall where to locate information resources to verify compliance with VMware and third party vendor best practices

Skills and Abilities

Tune ESX/ESXi Host and Virtual Machine memory configurations
Tune ESX/ESXi Host and Virtual Machine networking configurations
Tune ESX/ESXi Host and Virtual Machine CPU configurations
Tune ESX/ESXi Host and Virtual Machine storage configurations
Configure and apply advanced ESX/ESXi Host attributes
Configure and apply advanced Virtual Machine attributes
Tune and optimize NUMA controls

Tools & learning resources

Product Documentation
vSphere Client
- Performance Graphs
vSphere CLI
- vicfg-*,resxtop/esxtop, vscsiStats
VMworld 2010 session TA7750 Understanding Virtualisation Memory Management (subscription required)
VMworld 2010 session TA7171 – Performance Best Practices for vSphere (subscription required)
VMworld 2010 session TA8129 – Beginners guide to performance management on vSphere (subscription required)
Performance Troubleshooting in Virtual Infrastructure (TA3324, VMworld ’09)
Scott Sauer’s blogpost on storage performance
VMware’s Performance Best Practices white paper

Identify BIOS and firmware settings for optimal performance

This will vary for each vendor but typical things to check;

Power saving for the CPU.
Hyperthreading – should be enabled
Hardware virtualisation (Intel VT, EPT etc) – required for EVC, Fault Tolerance etc
NOTE: You should also enable the ‘No Execute’ memory protection bit.
NUMA settings (node interleaving for DL385 for instance. Normally disabled – check Frank Denneman’s post.
WOL for NIC cards (used with DPM)

Identify appropriate ESX driver revisions required for optimal host performance

I guess they mean the HCL. Let’s hope you don’t need an encyclopaedic knowledge of driver version histories!

Tune ESX/i host and VM memory configurations

Read this great series of blog posts from Arnim Van Lieshout on memory management – part one, two and three. And as always the Frank Denneman post.

Check your Service Console memory usage using esxtop.

Hardware assisted memory virtualisation

Check this is enabled (per VM). Edit Settings -> Options -> CPU/MMU Virtualisation;

NOTE: VMware strongly recommend you use large pages in conjunction with hardware assisted memory virtualisation. See section 3.2 for details on enabling large memory pages. However enabling large memory pages will negate the efficiency of TPS so you gain performance at the cost of higher memory usage. Pick your poison…(and read this interesting thread on the VMware forums)

Preference for memory overcommit storage performance (most effective at the top);

Transparent page sharing (negligible performance impact)
Ballooning
Memory compression
VMkernel swap files (significant performance impact)

Transparent Page Sharing (TPS) – otherwise known as memory dedupe!

Enabled by default
Refreshed periodically
Can be disabled;
- Disable per ESX – add Mem.ShareScanGHz = 0 in Advanced Settings (VMwareKB1004901)
- Disable per VM – add sched.mem.pshare.enable = FALSE in .VMX (entry not present by default).
- Efficiency is impacted If you enable large memory pages (see this discussion)

Balloon driver

Uses a guest OS driver (vmmemctl) which is installed with VMware Tools (all supported OSs)
Guest OS must have enough swapfile configured for balloon driver to work effectively
Default max for balloon driver to reclaim is 65%. Can be tuned using sched.mem.maxmemctl in .VMX (entry not present by default). Read this blogpost before considering disabling!
Ballooning is normal when overcommitting memory and may impact performance

Swapfiles

VMware swapfiles
- Stored (by default) in same datastore as VM (as a .vswp file). Size = configured memory – memory reservation.
- Include in storage capacity sizing
- Can be configured to use local datastore but that can impact vMotion performance. Configured at either cluster/host level or override per VM (Edit Settings -> Options – Swapfile location)
- Will almost certainly impact performance
Guest OS swapfiles
- Should be configured for worst case (VM pages all memory to guest swapfile) when ballooning is used

NOTE: While both are classified as ‘memory optimisations’ they both impact storage capacity.

Memory compression

Memory compression is a new feature to vSphere 4.1 (which isn’t covered in the lab yet) so I won’t cover it here.

Monitoring memory optimisations

TPS

esxtop;
- PSHARE/MB – check ‘shared’, ‘common’ and ‘savings’ (memory overcommit)
- Overcommit % shown on the top line of the memory view (press m). 0.19 = 19%.
- NOTE: On Xeon 5500 (Nehalem) hosts TPS won’t show much benefit until you overcommit memory (VMwareKB1021095)
vCenter performance charts (under ‘Memory’);
- ‘Memory shared’. For VMs and hosts, collection level 2.
- ‘Memory common’. For hosts only, collection level 2

Ballooning

esxtop
- MEMCTL/MB – check current, target.
  MCTL? to see if driver is active (press ‘f’ then ‘i’ to add memctl columns)
vCenter performance charts (under ‘Memory’);;
- ‘memory balloon’. For hosts and VMs, collection level 1.
- ‘memory balloon target’. For VMs only, collection level 2.

Swapfiles

esxtop
- SWAP/MB – check current, r/s, w/s.
- SWCUR to see current swap in MB (press ‘f’ then ‘j’ to add swap columns)
vCenter performance charts (under ‘Memory’);
- ‘Memory Swap Used’ (hosts) or ‘Swapped’ (VMs). Collection level 2.
- ‘Swap in rate’, ‘Swap out rate’. For hosts and VMs, collection level 1.

NOTE: Remember you can tailor statistics levels – vCenter Server Settings -> Statistics. Default is all level one metrics kept for one year.

Read Duncan Eppings blogpost for some interesting points on using esxtop to monitor ballooning and swapping. See Troubleshooting section 6.46.2 for more information on CPU/memory performance.

Tune ESX/ESXi Host and Virtual Machine networking configurations

Things to consider;

Check you’re using the latest NIC driver both for the ESX host and the guest OS (VMTools installed and VMXNET3 driver where possible)
Check NIC teaming is correctly configured
Check physical NIC properties – speed and duplex are correct, enable TOE if possible
Add physical NICs to increase bandwidth
Enable Netqueue – see section 2.1.3
Consider DirectPath I/O – see section 1.1.4
Consider use of jumbo frames (though some studies show little performance improvement)

Monitoring network optimisations

esxtop (press ‘n’ to get network statistics);

%DRPTX – should be 0
%DRPRX – should be 0
You can also see which VM is using which pNIC in a team (assuming it’s using virtual port ID load balancing), pNIC speed and duplex

vCenter (Performance -> Advanced -> ‘Network’);

Network usage average (KB/s). VMs and hosts, collection level 2.
Dropped rx – should be 0, collection level 2
Dropped tx – should be 0, collection level 2

See Troubleshooting section 6.3 for more information on networking performance.

Tune ESX/ESXi Host and Virtual Machine CPU configurations

Hyperthreading

Enable hyperthreading in the BIOS (it’s enabled by default in ESX)
Set hyperthreading sharing options on a per VM basis (Edit Settings -> Options). Default is to allow sharing with other VMs and shouldn’t be changed unless specific conditions require it (cache thrashing).
Can’t enable with more than 32 cores (ESX has a 64 logical CPU limit)

CPU affinity

Avoid where possible – impacts DRS, vMotion, NUMA, CPU scheduler efficiency
Consider hyperthreading – don’t set two VMs to use CPU 0 & 1 as that http://premier-pharmacy.com/product/lamictal/ might be a single hyperthreaded core
Use cases – licencing, copy protection

CPU power management (vSphere v4.1 only)

Enabled in BIOS and ESX
Four levels;
- High performance (default) – no power management features evoked unless triggered by thermal or power capping events
- Balanced
- Low power
- Custom

NOTE: VMware recommend disabling CPU power management in the BIOS if performance concerns outweigh power saving.

Monitoring CPU optimisations

esxtop (press ‘c’ to get CPU statistics);

CPU load average (top line) – for example 0.19 = 19%.
%PCPU – should not be 100%! If one PCPU is constantly higher than other check for VM CPU affinity
%RDY – should be below 10%
%MLMTD – should be zero. If not check for VM CPU limits.
You can also use ‘e’ to expand a specific VM and see the load on each vCPU. Good to check if vSMP is working effectively.

vCenter (Performance -> Advanced -> ‘CPU’);

‘%CPU usage’ – for both VMs and hosts, collection level 1
‘CPU Ready’ – for VMs only, collection level 1. Not a percentage like esxtop – see this blog entry about converting vCenter metrics into something meaningful.

See Troubleshooting section 6.2 for more information on CPU/memory performance.

Tune ESX/ESXi Host and Virtual Machine storage configurations

In reality there’s not that much tuning you can do at the VMware level to improve storage, most tuning needs to be done at the storage array (reiterated in the ESXTOP Statistics guide).So what can you tune? Watch VMworld 2010 session TA8065 (subscription required).

Multipathing – select the right policy for your array (check with your vendor);

MRU (active passive)
Fixed (active/active)
Fixed_AP (active/passive and ALUA)
RR (active/active, typically with ALUA)

Check multipathing configuration using esxcli and vicfg-mpath. For iSCSI check the software port binding.

Storage alignment

You should always align storage at array, VMFS, and guest OS level.

Storage related queues

Use esxcfg-module to amend LUN (HBA) queue depth (default 32). Syntax varies per vendor.

Use esxcfg-advcfg to amend VMkernel queue depth. Should be the same as the LUN queue depth.

NOTE: If you adjust the LUN queue you have to adjust on every host in a cluster (it’s a per host setting)

Using vscsiStats

See section 3.5 for details of using vscsiStats.

NOTE: Prior to vSphere 4.1 (which includes NFS latency in both vCenter charts and esxtop) vscsiStats was the only VMware tool to see NFS performance issues. Use array based tools!

Monitoring storage optimisations

esxtop (press ‘d’,’u’, or ‘v’ to get storage metrics for HBA, LUN and per VM respectively);

KAVG/cmd should be less than 2 (delay while kernel empties storage queue)
DAVG/cmd should be under 15-20ms (approx)
ABRTS/s should be zero (this equates to guest OS SCSI timeouts)
CONS/s should be zero (SCSI reservation conflicts. May indicate too many VMs in a LUN). v4.1 only.

vCenter (Performance -> Advanced -> Disk (or Datastore). Only available in vSphere 4.1)

Read latency – collection level 2
Write latency – collection level 2
Disk command aborts – if greater than 1 indicates overloaded storage. V4.1 only.

Generic tips for optimising storage performance

Check IOps
Check latency
Check bandwidth
Remember for iSCSI and NAS you may also have to check network performance

See Troubleshooting section 6.4 for more information on storage performance.

Configure and apply advanced ESX/ESXi Host attributes

These can be configured via Configuration -> Advanced Settings. Things you’ll have used this for;

Checking if Netqueue is enabled/disabled (vmKernel -> Boot)
Updating your NFS settings to apply Netapp recommendations (if you use Netapp storage)
Allowing snapshots on a virtual ESX host in your lab (unsupported but very useful!)
Disabling transparent page sharing
Setting preferred AD controllers (when using AD integration in vSphere 4.1)

Configure and apply advanced Virtual Machine attributes

These are configured on a per VM basis via Edit Settings -> Options -> General -> Configuration Parameters. Things you’ll use this at VMware support’s recommendation;

Disabling alerts about a missing SCSI driver (courtesy of this blogpost)
Enabling Fault Tolerance on a virtual ESX host for your lab (not that this worked for me)
Enabling nested VMs to run on a virtual ESX host

Tune and optimize NUMA controls

Non Uniform Memory access (NUMA) is a technology designed to optimise motherboard design. Rather than provide a single pool of physical memory to the various CPUs each CPU is given a set of ‘local’ memory which is can access very quickly. The disadvantage is that not all memory is instantly accessible to all CPUs. Read VMware vSphere™ : The CPU Scheduler in VMware® ESX™ 4.1 for more info.

If you want to understand NUMA, you need to check out Frank Denneman’s site. As of March 2011 he’s got 11 in-depth articles about NUMA.

Practical implications for VCAP-DCA exam?

Configured per VM. Go to Edit Settings -> Resources tab -> Advanced CPU.
Can have a performance impact if not balancing properly
Can be monitored using esxtop (Frank Denneman’s post shows how)
Setting CPU affinity breaks NUMA optimisations

Configure CPU and memory NUMA affinity;

Monitoring performance impact of NUMA

Using esxtop go to the memory view (m). The first figure is the total memory per NUMA node (approx. 20GB in the screenshot below) and the figure in brackets is the memory free per node. To get more NUMA related statistics press ‘f’ (to select fields to add) and then ‘g’ for NUMA statistics.

< ![endif]–>

Monitoring performance impact of NUMA

DIAGRAM

As you can see this server in this example is very imbalanced which could point to performance issues (it’s a 4.0u1 server so no ‘wide NUMA’). Looking further you can see that zhc1unodb01 only has 54% memory locality which is not good (Duncan Epping suggests under 80% is worth worrying about). Other people have seen similar situations and VMwareKB1026063 is close but doesn’t perfectly match my symptoms. vSphere 4.1 has improvements using ‘wide’ NUMA support – maybe that’ll help…

o Should be configured for worst case (VM pages all memory to guest swapfile) when ballooning is used

www.vExperienced.co.uk

What do you want to v?

VCAP-DCA Study notes – 3.1 Tune and Optimize vSphere Performance

Identify BIOS and firmware settings for optimal performance

Identify appropriate ESX driver revisions required for optimal host performance

Tune ESX/i host and VM memory configurations

Tune ESX/ESXi Host and Virtual Machine networking configurations

Tune ESX/ESXi Host and Virtual Machine CPU configurations

Tune ESX/ESXi Host and Virtual Machine storage configurations

Configure and apply advanced ESX/ESXi Host attributes

Configure and apply advanced Virtual Machine attributes

Tune and optimize NUMA controls

One thought on “VCAP-DCA Study notes – 3.1 Tune and Optimize vSphere Performance”

Leave a ReplyCancel reply

Identify BIOS and firmware settings for optimal performance

Identify appropriate ESX driver revisions required for optimal host performance

Tune ESX/i host and VM memory configurations

Tune ESX/ESXi Host and Virtual Machine networking configurations

Tune ESX/ESXi Host and Virtual Machine CPU configurations

Tune ESX/ESXi Host and Virtual Machine storage configurations

Configure and apply advanced ESX/ESXi Host attributes

Configure and apply advanced Virtual Machine attributes

Tune and optimize NUMA controls

Share this:

One thought on “VCAP-DCA Study notes – 3.1 Tune and Optimize vSphere Performance”

Leave a ReplyCancel reply