While upgrading my home lab recently I found myself reconsidering the scale up vs scale out argument. There are countless articles about building your own home lab and whitebox hardware but is there a good alternative to the accepted ‘two whiteboxes and a NAS’ scenario that’s so common for entry level labs? I’m studying for the VCAP5-DCD so while the ‘up vs out’ discussion is a well trodden path there’s value (for me at least) in covering it again.
There are two main issues with many lab (and production) environments, mine included;
- Memory is a bottleneck and doubly so in labs using low end hardware – the vCentre appliance defaults to 8GB, as does vShield Manager so anyone wanting to play with vCloud (for example) needs a lot of RAM.
- Affordable yet performant shared storage is also a challenge – I’ve used both consumer NAS (from 2 to 5 bays) and ZFS based appliances but I’m still searching for more performance.
In an enterprise environment there are a variety of solutions to these challenges – memory density is increasing (up to 512GB per blade in the latest UCS servers for example) and on the storage front SSDs and flash memory have spurred innovations in the storage battle. In particular Fusion-IO have had great success with their flash memory devices which reduce the burden on shared storage while dramatically increasing performance. I was after something similar but without the budget.
When I built my newest home lab server, the vHydra I used a dual socket motherboard to maximise the possible RAM (up to 256GB RAM) and used local SSDs to supplement my shared storage. This has allowed me to solve the two issues above – I have a single server which can host a larger number of VMs with minimal reliance on my shared storage. The concepts are the same as solutions like Fusion-IO aim to do in production environments but mine isn’t particularly scalable. In fact it doesn’t really scale at all – I’ll have to revert to centralised storage if I buy more servers. Nor does it have any resilience – the ESXi server itself isn’t clustered and the storage is a single point of failure as there’s no RAID. It is cheap however, and for lab testing I can live with those compromises. None of this is vaguely new of course – Simon Gallagher’s vTardis has been using these same concepts to provide excellent lab solutions for years. Is this really a poor man’s Fusion-IO? There’s nothing like the peformance and nothing like the budget but the objectives are the same but to be honest it’s probably a slightly trolling blog title. I won’t do it again. Promise!
If you’re thinking of building a home lab from scratch consider buying a single large server with local SSD storage instead of multiple smaller servers with shared storage. You can always scale out later or wait for Ceph or HDFS to elimate the need for centralised storage at all…
Tip: It’s worth bearing in mind the 32GB limit on the free version of ESXi – unless you’re a vExpert or they reinstate the VMTN subscription you’ll be stuck with 60 day eval editions if you go above 32GB (or buying a licence!).
Is performant a word?
In the first part of this series I introduced vCOps and it’s requirements before covering the new features in part two. This final blogpost covers the capacity features (available in the Advanced and higher editions) along with pricing information and my conclusions.
The previous trial I used didn’t include the capacity planning elements so I was keen to try this out. I’d used CapacityIQ previously (although only briefly) and found it useful but combined with the powerful analytics in vCOps it promises to be an even more compelling solution. VMware have created four videos with Ben Scheerer from the vCOps product team – they’re focused on capacity but if you’ve watched Kit Colbert’s overview much of it will be familiar;
UPDATE APRIL 2012 – VMware have just launched 2.5 hrs of free training for vCOps!
If you don’t have time to watch the videos and read the documentation (section 4 in the Advanced Getting Started guide) here’s the key takeaways;
- Capacity information is integrated throughout the product although modelling is primarily found under the ‘Planning’ view. Almost every view has some capacity information included either via the dynamic thresholds (which indicate the standard capacity used) or popup graphs of usage and trending.
- Storage is now included in the capacity calculations (an improvement over CapacityIQ) resulting in a more complete analysis. Datastores are now shown in the Operations view although if you’re like me and use NFS direct to the guest OS it’s not going to be as comprehensive as using block protocols.
- the capacity tools require more tailoring to your environment than the performance aspects but provide valuable information
- With vCOps you can both view existing and predicted capacity and you can model changes like adding hosts or VMs.
In part one of Using vCenter Operations I covered what the product does along with the different versions available and deployment considerations. In this post I’ll delve into what’s new and improved and in the final part I’ll cover capacity features, product pricing, and my overall conclusions. I had intended to cover the configuration management and application dependency features too but it’s such a big product I’ll have to write another blogpost or I’ll never finish!
Introductory learning materials
UPDATE APRIL 2012 – VMware have just launched 2.5 hrs of free training for vCOps.
Deep dive learning materials;
What’s new and improved in vCOps
Monitoring is a core feature and for some people the only one they’re concerned about. As the size of your infrastructure grows and becomes more complex the need for a tool to combine compute, network, and storage in real time also grows. Here are my key takeaways;
- there’s a new dashboard screen which shows health (immediate issues), risks (upcoming issues) and efficiency (opportunity for improvements) in a single screen. The dashboard can provide a high level view of your infrastructure and works nicely on a plasma screen as your ‘traffic light’ view of the virtual world (and physical if you go with Enterprise+). The dashboard can also be targeted at the datacenter, cluster, host or VM level which I found very useful although you can only customise the dashboard in Enterprise versions. There is still the Operations view (the main view in vCOPS v1) which now also includes datastores. This view scales extremely well – even if you have thousands of VMs and datastores across multiple vCenters they can all be displayed on a single screen.
NOTE: If you find some or all of your datastores show up as grey with no data (as mine did) there is a hotfix available via VMware support.
- Read more…
At VMworld 2011 in Copenhagen VMware unveiled a significant revamp of their management suites, including a new version of vCenter Operations Manager (v5 to align with the vSphere release). vCenter Operations is now a suite of tools which includes vCenter Configuration Manager, the new vCenter Infrastructure Navigator (which I’ll cover in a later blogpost) and vCenter CapacityIQ (which is now fully integrated into vCOps, the standalone CapacityIQ is now end of life).
Although announced at VMworld it wasn’t publicly available until Jan 2012 when VMware formally launched vCOps v5. Coming less than a year after the release of the first version it’s apparent that VMware see this as an important product which is evolving fast. Steven Herrod, VMware’s CIO stated recently at the Italian VMUG (around the 5 minute mark) that vCOps ‘is becoming the most adopted new technology that VMware has ever had’. The vCenter Operations suite is still aimed at infrastructure monitoring as opposed to application monitoring (despite the addition of Infrastructure Navigator) – VMware’s solutions aimed at the application tier belong to the vFabric suite. For a good overview of where vCOps and vFabric Hyperic fit into VMware’s cloud suite read Dave Hill’s blogpost on the subject.
If you aren’t familiar with vCenter Operations here are the kind of problems it aims to address;
- Is your virtual infrastructure healthy?
- What serious problems should I address immediately?
- Is the workload in my environment normal?
- Am I using the resources in my environment efficiently?
- How long do I have before resources run out?
- What impact did a recent change have?
A few people have already posted articles which I’d recommend reading;
With v1.0 I concluded that it was a great product but there were a few reasons why it wasn’t for me, primarily the lack of email notifications and pricing. In this post I’ll cover the requirements and deployment considerations for the new version and in part two I’ll cover day to day use and new features. The final part will cover the capacity features along with info about pricing and my conclusions.
UPDATE APRIL 2012 – VMware have just launched 2.5 hrs of free training for vCOps.
- Understand the DRS slot‐size algorithm and its impact on migration recommendations
- Identify tools needed for monitoring capacity planning
- Identify performance metrics related to resource contention and saturation
Skills and Abilities
- Predict when additional ESX/ESXi Host, network or storage resources will be required by observing an existing environment
- Determine when to expand or contract provisioned Virtual Machine resources based upon observed Virtual Machine utilization
- Interpret performance metrics from vCenter to properly size the environment
Again there is a considerable overlap between this objective and the others in section three – the goal of understanding the DRS slot-size is an exact duplicate from section 3.3!
DRS slot size algorithm and its impact on migration recommendations
This was covered in section 3.3. You can always reread the DRS deepdive at Yellow Bricks.
Identify tools needed for monitoring capacity planning
- vCenter Performance Charts
- vCenter Storage views
- esxtop (particularly in batch or reply mode)
- Third party tools (not likely in VCAP-DCA exam though)
Consider SCSI reservations per LUN, number of VMs per LUN. Adaptive vs predictive LUN sizing.
Predict when additional ESX/ESXi Host, network or storage resources will be required by observing an existing environment
Refer to section 3.1 for the metrics to check. Ballpark;
- Memory – how much is in the host compared to active memory used? Factor in reservations etc
- Network – any dropped packets? Might imply greater bandwidth required…
- CPU – check for long term patterns using Performance Charts.
- I/O – high latency or lack of capacity are the main indicators to look for
Interpret performance metrics from vCenter to properly size environment
Be aware what the various metrics actually show you. For example what’s the difference between Host Memory and Guest Memory in the screenshot below?? The answers can be found in VMworld session TA8129 Beginners guide to performance management.
vCenter and ESXTOP present statistics differently. While ESXTOP tends to display a more useful figure (%CPU ready for example) the value presented in vCenter needs to be calculated depending on the time interval.
Remember that vCenter summary statistics can sometimes mislead – memory per host looks fine in the screenshot above but you might find NUMA locality is low (for example).