Using vCenter Operations v5 – Capacity features and conclusions (3/3)

In the first part of this series I introduced vCOps and it’s requirements before covering the new features in part two. This final blogpost covers the capacity features (available in the Advanced and higher editions) along with pricing information and my conclusions.

The previous trial I used didn’t include the capacity planning elements so I was keen to try this out. I’d used CapacityIQ previously (although only briefly) and found it useful but combined with the powerful analytics in vCOps it promises to be an even more compelling solution. VMware have created four videos with Ben Scheerer from the vCOps product team – they’re focused on capacity but if you’ve watched Kit Colbert’s overview much of it will be familiar;

UPDATE APRIL 2012 – VMware have just launched 2.5 hrs of free training for vCOps!

If you don’t have time to watch the videos and read the documentation (section 4 in the Advanced Getting Started guide) here’s the key takeaways;

  • Capacity information is integrated throughout the product although modelling is primarily found under the ‘Planning’ view. Almost every view has some capacity information included either via the dynamic thresholds (which indicate the standard capacity used) or popup graphs of usage and trending.
  • Storage is now included in the capacity calculations (an improvement over CapacityIQ) resulting in a more complete analysis. Datastores are now shown in the Operations view although if you’re like me and use NFS direct to the guest OS it’s not going to be as comprehensive as using block protocols.
  • the capacity tools require more tailoring to your environment than the performance aspects but provide valuable information
  • With vCOps you can both view existing and predicted capacity and you can model changes like adding hosts or VMs.

Capacity Monitoring

You can see visual representations of capacity remaining using the Analysis views. Setting the focus to ‘Capacity’ shows the following three views;

  • Cluster capacity remaining (sized by workload, grouped by datacenter)
  • Host capacity remaining (sized by workload, grouped by datacenter/cluster)
  • VM capacity remaining (sized by workload, grouped by cluster/host)
Analysis view of host capacity

There are over 20 predefined views which let you focus on CPU, memory, network, and storage and you can create powerful custom views. The views use both colour and size to represent metrics from the thousands collected (or created) by vCOps analytics (as I did with my vmnic2 example in part two).

Calculating capacity is not quite as simple as measuring performance, and consequently there are more configurable options. For starters what if you’ve enabled HA on your clusters should you report on unused capacity excluding that reserved for admission control or total physical capacity of the cluster? Given that capacity is typically over a period of time rather than right now, what time period is relevant? What would you consider a maximum safe usage for your infrastructure – would 100% really be desirable or do you need to leave some headroom for risk avoidance? All these factors need to be considered and vCOps let’s you configure them (it also ships with some defaults);

Options for what’s considered ‘usable’ capacity
Specifying relevant time periods
Options for undersized and oversized VMs

In my opinion the defaults for over and undersized VMs are rather unrealistic. This VMware blog details how the metrics are calculated but in summary;

  • A VM is classed as undersized if CPU/memory exceed 70% of the configured amount for 1% of the monitored time interval
  • A VM is classed as oversized if CPU/memory drop below 30% of configured amount for 1% of the monitored time interval

If you have VMs which are only active during working hours for example, they may get categorised as oversized because they’re idle at night (the default time interval is four weeks and default is to monitor 24×7). I adjusted mine to require the VM to be in breach of the thresholds for 30% of the time. You could also adjust the time period to office hours, depending on the usage profile of your infrastructure.

Interestingly the vCOps Analytics and UI VMs were both was flagged as oversized despite being left at the vApp defaults! 🙂

Modelling the addition of 30 VMs to a cluster

Using the Planning view you can model (capacity plan) using ‘What If’ scenarios. This allows you to simulate both adding and removing hosts or VMs, and then by using existing VMs (as a ‘template’ for sizing and usage information). Unfortunately this is where using the VM folders would be ideal – I wanted to simulate adding an extra dev/test environment of around 90 VMs and when only ten are presented per screen you end up with lots of mouse clicks.

No automated tool will solve all your problems out of the box. The effectiveness of capacity planning will depend on your environment usage and configuration – as the help file states ‘Data that has a high degree of variability can distort trends’. In my case VM creation (and usage) varies considerably – we might create two VMs per week for a month and then get asked for another development environment which means another 90 VMs the following week. These peaks and troughs can be analysed over the long term but they tend to play havoc with any short term reports.

Capacity Reporting

There are twelve reports available which can be targeted against a given vCenter, datacenter, cluster, host, datastore, or VM object (although not all reports are available for all objects). Those who’ve used CapacityIQ before will recognise most of them! Reports can be run interactively or scheduled (daily, weekly or monthly) and can be automatically emailed. Most run very quickly (under 20 seconds when used with an estate of over 800 VMs). Some of the thresholds (what qualifies as under or oversized, idle etc) can be adjusted via the vCOps Configuration screen. I couldn’t get the scheduled emails to work but I suspect that’s more likely to be an issue with my environment than the product.

  • Virtual Machine Capacity Overview – overview of used, remaining capacity, time remaining and overall capacity efficiency
  • Capacity Inventory and Optimization – lists of hosts or clusters showing used and remaining VM capacity
  • Virtual Machine Optimization – summary of idle, powered off, undersized and oversized VMs
  • Idle Virtual Machines – list of all idle VMs
  • Oversized Virtual Machines – list of all oversized VMs
  • Undersized Virtual Machines – list of all undersized VMs
  • Powered off Virtual Machines – list of all powered off VMs
  • Host Utilization – Distribution of average capacity utilization across deployed hosts
  • Configured Host Capacity – Distribution of configured capacity across deployed hosts
  • Cluster or Host Capacity inventory – Lists of clusters or hosts showing remaining VM capacity
  • Virtual Machine List – List of related VMs
  • Datastore Capacity Utilization – Utilization report for the datastore
  • Waste, Stress and Capacity report – trend of waste and stress and summary of capacity for VMs

One of the reports I found immediately useful was the VM Capacity Overview report. This gives you average CPU usage, active memory, I/O usage and many more stats across your entire estate. These sort of statistics are often required for P2V projects – it’s just a shame that vCOps can only generate them for servers which are already virtual! They’re still very useful for future capacity planning however and as interesting figures to educate your management team.

The ‘All Metrics’ view showing multiple graphs for network throughput on a given vmnic

As an alternative to these prepared reports you can export detailed information from the ‘All Metrics’ view. This can be exported in CSV or PNG format, for any metric http://www.eta-i.org/valium.html gathered from vSphere (plus the vCOps analytic metrics), and for various time periods from the last hour to the last year. These views are extremely flexible – you can overlay up to three separate time periods and you can do side by side comparisons for as many metrics/time periods as you can fit on a screen. The only downside is that these can’t be scheduled or emailed so you’ll have to generate them interactively. The only limitation here is your knowledge of vSphere and the available metrics – if you know where to look I’m sure you could find anything!

Reports are new to vCOps (but come from CapacityIQ) and I found them slightly underwhelming;

  • Some reports are a bit thin on details – the Optimized VMs Report sounds useful (‘A summary of idle, powered off, oversized, undersized VMs) but only consists of five lines – no. of VMs total, oversized, undersized, powered off, idle. You can run other reports to drill into greater detail but I’m not sure there’s much value in this particular report. The Oversized VMs report covers both vCPU and memory but could have been clearer (there’s a %optimal column but it wasn’t clear if it referred to vCPU, vRAM, or an aggregate of both). 
  • You can’t create custom reports so if it’s not covered in the standard reports you’ll have to export from the All Metrics screen.
  • You can’t target VM folders so I can’t report on the logical organisation of my virtual infrastructure, only the physical one, which is a shame.

Pricing

The high cost of vCOps was my biggest complaint with the first version, and while the product has improved considerably the pricing is still a major barrier. It’s still priced per VM (except the Enterprise+ version) so for my estate of 900 VMs I’m looking at the following list costs (and this is for 12×5 support not 24×7);

  • Standard edition (monitoring) = 36 x 25VM packs@£1260 = £45,360
  • Advanced edition (monitoring, capacity)= 36 x 25VM packs@£3157 = £113,652
  • Enterprise edition (monitoring, capacity, configuration and chargeback) = 36 x 25VM packs@£4875 = £175,500

I’m not familiar with the following products and they’re not ‘apples to apples’ comparisons but let’s compare those prices to some popular competitor’s products (all with one year’s 12×5 support);

  • Veeam ONE (monitoring, capacity and configuration management) = 56 dual socket hosts x £430/socket = £48600 (£430 per socket is an average of the two tier pricing based on cores per socket)
  • vFoglight Professional (monitoring, capacity, chargeback) = 56 dual socket hosts x £435/socket = £48720 (these prices might have changed now that Quest have purchase vKernel)

The prices are also significantly higher than the management solutions we’ve bought previously from other vendors such as Netapp (Operations Manager) and Solarwinds (Orion), despite the fact that in each case they cover the same scale infrastructure which vCOps would manage. vCOps isn’t comprehensive enough to replace those tools either so it’s an extra expense. Another minor concern is the cost of the storage required for the metrics. Even with the ‘small’ deployment and up to 1500 VMs, six months data retention is just under 1TB.

In his review David Davis concluded that vCOPS had ‘nice introductory pricing, based on a per VM basis, that makes it look attractive’. I can’t say I agree with that! Bernd Herzog over at the Virtualization Practice has a great article about VMware’s future direction where he highlights one of the problems with vCOps licensing. He rightly points out that one of the primary benefits of vCOps is managing capacity and hopefully allowing you to increase your VM density BUT if you do this you’ll have to pay VMware even more to license vCOps for the increase in VMs.

Let’s do a worked example. For 900 VMs I’d be paying £65k to add capacity planning features (over and above the Standard edition). If this helped me double my consolidation ratio it’d be a good ROI right? Unfortunately I’d have to pay another £113k to increase my vCOps licensing to cater to the potential size of my estate, and that’s before you consider the vRAM implications. As an alternative I could spend £175k on extra infrastructure and use a competitor’s socket based solution where the costs don’t scale per VM.

I’m the virtualization evangelist in my company but on this basis vCOps is a tough sell. It all comes down to the value you place on the availability and efficiency of your infrastructure. If you’re a bank and every minute of downtime costs millions it’s probably easy to justify. How much time would your technical teams waste diagnosing issues and producing reports, both of which also have cost implications? As I stated in my initial review of vCOps last year though the final price you pay is often open to negotiation. Let’s hope so!

Conclusions

vCOPS is a cool product which gives you a better understanding of your infrastructure. I’d recommend everyone install the trial because within those 60 days you’ll learn a lot and probably fix issues you didn’t even know you had. Given the simplicity of the install and lack of configuration required you have nothing to lose and much to gain, and it’s free! Note that the EULA for the 60 trial does specify ‘nonproduction environment’!

Almost every issue I noted with the first release has been fixed – email notifications, multiple vCenter support, availability monitoring, the ability to clear alerts and set manual thresholds, and reporting in various formats. There’s a lot of functionality within vCOps (which takes time to learn and understand) and if you opt for the higher editions which include CapacityIQ and Configuration Manager functionality this becomes a beast of a product. While the vApp deployment is straightforward, taming performance, capacity, and configuration control is never trivial even with the best tools. Don’t expect to master it overnight.

The product has markedly improved since the first release but along with the comprehensive feature set and patented analytics comes a crazy price tag which I think will limit the adoption. My vCOps trial has been like test driving a Ferrari – you love the experience but realise you can’t afford it on a permanent basis. I’d love to have vCOPS in my toolkit – it would answer 95% of the ‘is there a performance issue with x/y/z’ or ‘do we have enough capacity to add another 50 VMs’ and with minimal effort. If you have an enterprise agreement with VMware or your job is simply to say ‘we need this tool’ and let someone else do the commercials, happy days! I’ll be sending this to my manager with a ‘recommend we buy’ tag but I fully expect a shocked expression when he sees the price.

Further Reading

vCOPS in 5 mins – an amusing overview from VMware! (marketing)

VMware’s vCOPS Evaluation Center

VMware Forums for vCOps

vCOps learning resources (Gregg Robertson)

5 Minutes with vCenter Operations Manager 5 (Bob Plankers)

vKernel’s vOps moves to counter vCenter Operations

SearchServerVirtualizations summary of virtual monitoring solutions (inc vCOps, Veeam Monitor and vFoglight)

Kendrick Coleman’s blogpost about using the vCloud adapter with vCOps

VMworld 2011 sessions (login required);

  • CIM2452 – VMware vCenter Operations DeepDive (Kit Colbert). NOTE: This covers v1 not v5 but the fundamentals still apply.
  • CIM2285 – Automated Infrastructure and Operations Managment with VMware vCenter Operations

The VMware Communities roundtable #178 about Capacity Management with vCOPS (56 mins)

The VMware Communities roundtable #172 about vCOPS (with Kit Colbert, Matt Cowger and Bob Plankers) (56 mins)

The VMware Communities roundtable #166 about vCOPS and Infrastructure Navigator (58 mins)

Some sample dashboard views courtesy of @3cVguy

Using vCOPS with vCloud Director

Twitter people;

vCenterOps

7 thoughts on “Using vCenter Operations v5 – Capacity features and conclusions (3/3)

    1. You’re welcome Steffan, thanks for taking the time to feedback. The real test of any monitoring tool is how well it works for you in your environment which is where the dynamic thresholds are invaluable. If you deploy it I’d be interested to hear your thoughts.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.