Knowledge
- Identify resxtop/esxtop metrics related to memory and CPU
- Identify vCenter Server Performance Chart metrics related to memory and CPU
Skills and Abilities
- Troubleshoot ESX/ESXi Host and Virtual Machine CPU performance issues using appropriate metrics
- Troubleshoot ESX/ESXi Host and Virtual Machine memory performance issues using appropriate metrics
- Use Hot‐Add functionality to resolve identified Virtual Machine CPU and memory performance issues
Tools & learning resources
This is another objective that’s hard to quantify – experience will be the main requirement! There are some great general purpose resources out there;
Note that resxtop (built in to the vMA) does not offer the ‘replay’ mode available in ESX classic. Source: VMworld session MA6580, vMA Tips and Tricks. Read more…
Knowledge
- Identify virtual switch entries in a Virtual Machine’s configuration file
- Identify virtual switch entries in the ESX/ESXi Host configuration file
- Identify CLI commands and tools used to troubleshoot vSphere networking configurations
- Identify logs used to troubleshoot network issues
Skills and Abilities
- Utilize net-dvs to troubleshoot vNetwork Distributed Switch configurations
- Utilize vicfg-* commands to troubleshoot ESX/ESXi network configurations
- Configure a network packet analyzer in a vSphere environment
- Troubleshoot Private VLANs
- Troubleshoot Service Console and vmkernel network configuration issues
- Troubleshooting related issues
- Use esxtop/resxtop to identify network performance problems
- Use CDP and/or network hints to identify connectivity issues
- Analyze troubleshooting data to determine if the root cause for a given network problem originates in the physical infrastructure or vSphere environment
Tools & learning resources
Identify virtual switch entries in a VMs configuration file
Contains both vSS and vDS entries;

In the example VM below it has three vNICs on two separate vDSs. When troubleshooting you may need to coordinate the values here with the net-dvs output on the host;
- NetworkName will show “” when on a vDS.
- The .VMX will show the dvPortID, dvPortGroupID and port.connectid used by the VM – all three values can be matched against the net-dvs output and used to check the port configuration details – load balancing, VLAN, packet statistics, security etc
NOTE: Entries are not grouped together in the .VMX file so check the whole file to ensure you see all relevant entries.

Identify virtual switch entries in the ESX/i host configuration file
The host configuration file (same file for both ESX and ESXi);
Like the .VMX file it contains entries for both switch types although there are only minimal entries for the vDS. Most vDS configuration is held in a separate database and can be viewed using net-dvs (see section 6.3.7).
Command line tools for network troubleshooting
The usual suspects;
- vicfg-nics
- vicfg-vmknic
- vicfg-vswitch (-b) for CDP
- vicfg-vswif
- vicfg-route
- cat /etc/resolv.conf, /etc/hosts
- net-dvs
- ping and vmkping
Knowledge
- Identify CLI commands and tools used to troubleshoot management issues
Skills and Abilities
- Troubleshoot vCenter Server service and database connection issues
- Troubleshoot the ESX Service Console firewall
- Troubleshoot ESX/ESXi server management and connectivity issues
- Determine the root cause of vSphere management or connectivity issue
Tools
Identify CLI tools used to troubleshoot management issues
- vicfg-vswitch
- vicfg-vmknic
- vicfg-vswif
- vpxd.exe -s
There are a few more covered later in this objective for restarting management agents on ESX/i hosts. This VMware article on resolution paths is a great place to start learning about troubleshooting.
Troubleshoot vCenter Server service and database connection issues
- Check the VMware vCenter service is started and the account it’s configured to run as. Check that account isn’t locked out.
- Start vCentre using vpxd.exe;
- ‘vpxd.exe –s’ to start it as an application rather than a service. This will show error messages in plain text rather than the cryptic service codes.
- ‘vpxd.exe –p’ refreshes the password hash used to connect to the database. Used after replacing the default SSL certificates (VMwareKB1003070)
- How to set SQL as a service dependency – blog post
- With a lab setup and SQL Express the database often grows to the 4GB limit, at which point the vCenter service will fail. Follow VMwareKB1025914for details of how to cleardown data in the vCenter database.
- Check the ODBC connectivity using the ‘Test’ button. Check the SQL security logs to see failed authentication attempts.

VMwareKB1003979 gives a good overview of the previous processes.
Read more…
This is one objective where you definitely have to get hands on – there’s no way you’ll learn esxtop otherwise. Ideally you’ll have a real infrastructure to play with as you want hosts with memory contention, ballooning, swapping, NUMA optimisations etc so you can play with and understand the features.
Knowledge
- Identify hot keys and fields used with resxtop/esxtop
- Identify fields used with vscsiStats
Skills and Abilities
- Configure esxtop/resxtop custom profiles
- Determine use cases for and apply esxtop/resxtop Interactive, Batch and Replay modes
- Use vscsiStats to gather storage performance data
- Use esxtop/resxtop to collect performance data
- Given esxtop/resxtop output, identify relative performance data for capacity planning purposes
Tools & learning resources
Using resxtop
Two ways of invoking;
- resxtop –server <esxi host>
- resxtop –server <vCenter server> –vihost <esxi host>
Knowledge
- Explain DRS affinity and anti‐affinity rules
- Identify required hardware components to support DPM
- Identify EVC requirements, baselines and components
- Understand the DRS slot‐size algorithm and its impact on migration recommendations
Skills and Abilities
- Properly configure BIOS and management settings to support DPM
- Test DPM to verify proper configuration
- Configure appropriate DPM Threshold to meet business requirements
- Configure EVC using appropriate baseline
- Change the EVC mode on an existing DRS cluster
- Create DRS and DPM alarms
- Configure applicable power management settings for ESX Hosts
- Properly size virtual machines and clusters for optimal DRS efficiency
- Properly apply virtual machine automation levels based upon application requirements
Tools & learning resources
Advanced DRS
- Read the DRS deepdive at Yellow Bricks.
- Use the (new to vSphere) DRS Faults and DRS History tabs to investigate issues with DRS
- By default DRS recalculates every 5 minutes (including DPM recommendations), but it also does so when resource settings are changed (reservations, adding/removing hosts etc).For a full list of actions which trigger DRS calculations see Frank Denneman’s HA/DRS book.
- It’s perfectly possible to turn on DRS even though all prerequisite functionality isn’t enabled – for example if vMotion isn’t enabled you won’t be prompted (at least until you try to migrate a VM)!
Affinity and anti-affinity rules
There are two types of affinity/anti-affinity rules;
- VM-VM (new in vSphere v4.0)
- VM-Host (new to vSphere 4.1)
The VM-VM affinity is pretty straightforward. Simply select a group of two or more VMs and decide if they should be kept together (affinity) or apart (anti-affinity). Typical use cases;
- Webservers acting in a web farm (set anti-affinity to keep them on separate hosts for redundancy)
- A webserver and associated application server (set affinity to optimise networking by keeping them on the same host)
VM-Host affinity is a new feature (with vSphere 4.1) which lets you ‘pin’ one or more VMs to a particular host or group of hosts. Use cases I can think of;
- Pin the vCenter server to a couple of known hosts in a large cluster
- Pin VMs for licence compliance (think Oracle, although apparently they don’t recognise this new feature as being valid – see the comments in this post)
- Microsoft clustering (see section 4.3 for more details on how to configure this)
- Multi-tenancy (cloud infrastructures)
- Blade environments (ensure VMs run on different chassis in case of backplane failure)
- Stretched clusters (spread between sites. See this Netapp post for Metrocluster details)
To implement them;
- Define ‘pools’ of hosts.
- Define ‘pools’ of VMs.
- Create a rule pairing one VM group with one host group.
- Specify either affinity (keep together) or anti-affinity (keep apart).
- Specify either ‘should’ or ‘must’ (preference or mandatory)
Read more…
This objective is focused on the VMs rather than the hosts but there’s still a large overlap between this objective and the previous one.
Knowledge
- Compare and contrast virtual and physical hardware resources
- Identify VMware memory management techniques
- Identify VMware CPU load balancing techniques
- Identify pre‐requisites for Hot Add features
Skills and Abilities
- Calculate available resources
- Properly size a Virtual Machine based on application workload
- Configure large memory pages
- Understand appropriate use cases for CPU affinity
Tools & learning resources
Identify memory management techniques
The theory – read the following blogposts;
The following memory mechanisms were covered in section 3.1 so I won’t duplicate;
- transparent page sharing
- ballooning (via VMTools)
- memory compression (vSphere 4.1 onwards)
- virtual swap files
- NUMA
There are also various mechanisms for controlling memory allocations to VMs;
- reservations and limitations
- shares – disk, CPU and memory
- resource pools (in clusters)
Disable unnecessary devices in the VM settings (floppy drive, USB controllers, extra NICs etc) as they have a memory overhead.
It’s hard to know what to cover in this objective as performance tuning often implies troubleshooting (note the recommended reading of Performance Troubleshooting!) hence there’s a significant overlap with the troubleshooting section. Luckily there are plenty of excellent resources in the blogosphere and from VMware so it’s just a case of reading and practicing.
Knowledge
- Identify appropriate BIOS and firmware setting requirements for optimal ESX/ESXi Host performance
- Identify appropriate ESX driver revisions required for optimal ESX/ESXi Host performance
- Recall where to locate information resources to verify compliance with VMware and third party vendor best practices
Skills and Abilities
- Tune ESX/ESXi Host and Virtual Machine memory configurations
- Tune ESX/ESXi Host and Virtual Machine networking configurations
- Tune ESX/ESXi Host and Virtual Machine CPU configurations
- Tune ESX/ESXi Host and Virtual Machine storage configurations
- Configure and apply advanced ESX/ESXi Host attributes
- Configure and apply advanced Virtual Machine attributes
- Tune and optimize NUMA controls
Tools & learning resources
Identify BIOS and firmware settings for optimal performance
This will vary for each vendor but typical things to check;
- Power saving for the CPU.
- Hyperthreading – should be enabled
- Hardware virtualisation (Intel VT, EPT etc) – required for EVC, Fault Tolerance etc
NOTE: You should also enable the ‘No Execute’ memory protection bit.
- NUMA settings (node interleaving for DL385 for instance. Normally disabled – check Frank Denneman’s post.
- WOL for NIC cards (used with DPM)
Identify appropriate ESX driver revisions required for optimal host performance
I guess they mean the HCL. Let’s hope you don’t need an encyclopaedic knowledge of driver version histories!
Tune ESX/i host and VM memory configurations
Read this great series of blog posts from Arnim Van Lieshout on memory management – part one, two and three. And as always the Frank Denneman post.
Check your Service Console memory usage using esxtop.
Hardware assisted memory virtualisation
Check this is enabled (per VM). Edit Settings -> Options -> CPU/MMU Virtualisation;

Enabling h/w CPU/memory assist for a VM
NOTE: VMware strongly recommend you use large pages in conjunction with hardware assisted memory virtualisation. See section 3.2 for details on enabling large memory pages. However enabling large memory pages will negate the efficiency of TPS so you gain performance at the cost of higher memory usage. Pick your poison…(and read this interesting thread on the VMware forums)
Read more…
This section overlaps with objectives 1.1 (Advanced storage management) and 1.2 (Storage capacity) but covers the multipathing functionality in more detail.
Knowledge
- Explain the Pluggable Storage Architecture (PSA) layout
Skills and Abilities
- Install and Configure PSA plug‐ins
- Understand different multipathing policy functionalities
- Perform command line configuration of multipathing options
- Change a multipath policy
- Configure Software iSCSI port binding
Tools & learning resources
- Product Documentation
- vSphere Client
- vSphere CLI
- VMware KB articles
Understanding the PSA layout
The PSA layout is well documented here, here. The PSA architecture is for block level protocols (FC and iSCSI) – it isn’t used for NFS.

Terminology;
- MPP = one or more SATP + one or more PSP
- NMP = native multipathing plugin
- SATP = traffic cop
- PSP = driver
There are four possible pathing policies;
- MRU = Most Recently Used. Typically used with active/passive (low end) arrays.
- Fixed = The path is fixed, with a ‘preferred path’. On failover the alternative paths are used, but when the original path is restored it again becomes the active path.
- Fixed_AP = new to vSphere 4.1. This enhances the ‘Fixed’ pathing policy to make it applicable to active/passive arrays and ALUA capable arrays. If no user preferred path is set it will use its knowledge of optimised paths to set preferred paths.
- RR = Round Robin
One way to think of ALUA is as a form of ‘auto negotiate’. The array communicates with the ESX host and lets it know the available path to use for each LUN, and in particular which is optimal. ALUA tends to be offered on midrange arrays which are typically asymmetric active/active rather than symmetric active/active (which tend to be even more expensive). Determining whether an array is ‘true’ active/active is not as simple as you might think! Read Frank Denneman’s excellent blogpost on the subject. Our Netapp 3000 series arrays are asymmetric active/active rather than ‘true’ active/active.
Knowledge
- Explain relationship between vDS and logical vSSes
Skills and Abilities
- Understand the use of command line tools to configure appropriate vDS settings on an ESX/ESXi host
- Determine use cases for and apply Port Binding settings
- Configure Live Port Moving
- Given a set of network requirements, identify the appropriate distributed switch technology to use
- Use command line tools to troubleshoot and identify configuration items from an existing vDS
Tools & learning resources
Relationship between vSS and vDS
Both standard (vSS) and distributed (vDS) switches can exist at the same time – indeed there’s good reason to use this ‘hybrid’ mode.
You can view the switch configuration on a host (both vSS and dvS) using esxcfg-vswitch -l. It won’t show the ‘hidden’ switches used under the hood by the vDS although you can read more about those in this useful article at RTFM or at Geeksilver’s blog.
Command line configuration of a vDS
The command line is pretty limited when it comes to vDS. Useful commands;
- esxcfg-vswitch
- esxcfg-vswitch -P vmnic0 -V 101 <dvSwitch> (link a physical NIC to a vDS)
- esxcfg-vswitch -Q vmnic0 -V 101 <dvSwitch> (unlink a physical NIC from a vDS)
- esxcfg-vswif -l | -d (list or delete a service console)
- esxcfg-nics
- net-dvs
NOTE: net-dvs can be used for diagnostics although it’s an unsupported command. It’s located in /usr/lib/vmware/bin. Use of this command is covered in section 6.4 Troubleshooting Network connectivity.
NOTE: esxcfg-vswitch can ONLY be used to link and unlink physical adaptors from a vDS. Use this to fix faulty network configurations. If necessary create a vSS switch and move your physical uplinks across to get your host back on the network. See VMwareKB1008127 or this blogpost for details.
Identify configuration items from an existing vDS
You can use esxcfg-vswitch -l to show the dvPort assigned to a given pNIC and dvPortGroup.
See the Troubleshooting Network connectivity section for more details.
Knowledge
- Identify VMware NIC Teaming policies
- Identify common network protocols
Skills and Abilities
- Understand the NIC Teaming failover types and related physical network settings
- Determine and apply Failover settings
- Configure explicit failover to conform with VMware best practices
- Configure port groups to properly isolate network traffic
Tools & learning resources
Identify, understand , and configure NIC teaming
The five available policies are;
- Route based on virtual port ID (default)
- Route based on IP Hash (MUST be used with static Etherchannel – no LACP). No beacon probing.
- Route based on source MAC address
- Route based on physical NIC load (vSphere 4.1 only)
- Explicit failover
NOTE: These only affect outbound traffic. Inbound load balancing is controlled by the physical switch.
Read more…
Recent Comments