As the complexity of virtual infrastructures increases it’s becoming harder to manage using conventional monitoring tools which were built with a more static environment in mind. In March 2011 VMware released the vCenter Operations product (vCOPS) to address this pain point. I’ve been running the 60 day trial at my company and now that the trial’s ending it’s time to share my thoughts.
What is vCOPS?
To quote the product page at VMware;
VMware vCenter Operations uses patented analytics and powerful visualizations to automate performance, capacity and configuration management. It collects and analyzes performance data, correlates abnormalities and identifies the root cause of building performance problems. VMware vCenter Operations provides capacity management to optimize resource usage and policy-based configuration management to assure compliance and eliminate sprawl and configuration drift. (emphasis my own)
The key differentiator is this promise to learn and understand the context of multiple metrics (CPU, memory, storage and network) and provide root cause analysis without you needing to manually define thresholds, benchmarks etc. Bear in mind that vCOPS is an infrastructure monitoring solution rather than application layer (which is more the domain on VMware’s AppSpeed, Quest’s solutions or ManageEngine’s Application Manager). I’m not the first blogger to cover this product so here’s some reading to get you up to speed;
- This great article by Bernd Harzog covers all editions of vCOPS along with some great analysis of it’s strengths and weaknesses.
- Eric Sloof’s blogpost describes how the product works in a bit more detail
- Kendrick Coleman’s post covers his own thoughts with some interesting comments.
- Chris Dearden has written an interesting post on monitoring including where he sees vCOPS fitting in.
While technically a ‘v1′ release the product comes from VMware’s purchase of Integrien (in August 2010) where it was originally marketed as VMAlive. Integrien have been working on the patented algorithms for several years so while the integration and VMware branding are new the guts of the product are not. VMware have published some YouTube videos or you can listen to VM Communities podcast #119 to get an overview of what vCOPS can offer.
As this graphic shows vCOPS is available in three editions. The Standard version is the one I’ve been trialing because;
a) it’s the entry level and least costly
b) VMware are pushing this quite hard. We were contacted and offered a trial along with technical support and it’s also been featured at VMUG sessions (including a working lab at the LonVMUG)
c) the Advanced and Enterprise editions include extra functionality (capacity management and configuration management respectively) but cost and complexity to implement increase accordingly and they weren’t in scope for my needs. vCOPS Advanced is roughly 2.5 times more expensive compared to Standard – I didn’t even ask about Enterprise….
Install and initial configuration
According to the documentation vCOPS ‘integrates’ with vCenter, which poses a few questions. It means unless you have a lab to use for evaluation (and that would need enough hosts and VMs to generate meaningful metrics) you’re installing a ‘trial’ into your production system. Luckily the ‘integration’ merely implies that it pulls statistics out of your existing vCenter database (more on this later) so the touchpoints and potential impact are minimal. Unfortunately from a GUI point of view the integration is also minimal – vCOPS uses a web portal to display all it’s information and you get an icon in the ‘Solutions’ section which allows you to open a browser window inside vCenter. You can’t right click an object (VM, host, cluster etc) and get relevant vCOPS statistics so it’s largely marketing spiel!
vCOPS requires vCenter 4.1 and by default is configured with 8GB RAM – not good for home labs but not a major issue for most production environments. The underlying host running the vCOPS appliance must be ESX 4.0U2 or greater but it’s able to monitor older hosts.
On the plus side it’s VERY easy to install and deploy as it’s bundled as an appliance (installation in detail at vSpecialist.co.uk);
- Install the appliance giving the usual information – IP address, DNS etc
- Register with the vCenter server – two accounts needed, one to register the vCOPS and another used for collection of metrics. Registration needs admin, collection can be less privileged depending on what you want to see. vCOPS uses AD integrated authentication.
- Licence vCOPS in vCenter. The freely available trial lasts for 60 days which gives plenty of time to investigate what it offers.
Beyond the above steps very little configuration is needed (or even possible). You can choose a light or dark colour scheme and strangely you can pick your navigation of choice – breadcrumb or dropdown menu. This underlies a slight weakness in my opinion – the web interface isn’t entirely consistent which makes navigation confusing. Sometimes a right click is available (unusual for a web application) and sometimes it isn’t.
A good place to start is with the official Evaluator’s guide. This takes you through the initial dashboard, explains the concepts of workload, health, and capacity and how to navigate between the various screens. Once up and running it’s very impressive how much information you’re presented with very quickly (within a few hours) although the dynamic thresholds can take 14 days to fully establish (according to the Install and configure guide on VMware’s website). You can also read the vCOPS Unplugged blog series for explanations into how the product works, although to date there’s only one post.
I quickly found myself using vCOPS for performance queries and it came up trumps on multiple occasions. Compared to using the built-in vCenter performance charts vCOPS’ advantage is that it combines all facets of performance – compute, network, storage etc rather than needing the admin to correlate the various metrics. It also does a great job of analysing the stack from top to bottom and making it easy to see at a glance (the view is very similar to vKernel’s vOPS though I’m not sure who came up with it first). Not only will it indicate issues per VM but it’ll indicate if the underlying host, cluster, or even datacenter has associated issues. As I mentioned at the start the algorithms are what distinguishes this product – if network usage is typically high for a given VM (for example) it won’t constantly alert you as it undertstands this is ‘normal’ behaviour (during a daily backup for example).
This is also where I was less impressed with vCOPS – it won’t alert you to anything via email, pager, SMS etc. This means that unless you’ve got the console open you won’t be aware of any issues until your end users tell you – which is the problem I needed this tool to prevent! I know my daily workload and I’m too busy to spend my time monitoring a console – I need notification when things happen and preferably with enough warning to implement a fix before the end user is affected. I found this limited my use of the product significantly – most of my other management tools notify me if something’s amiss which means I can operate on a ‘no news is good news’ philosophy.
Another thing it doesn’t do is availability monitoring (although it doesn’t claim to either). I found this out the hard way when five VMs dropped off the network due to a faulty configuration change. Even though the network I/O changed significantly (ie dropped to zero) there were no warnings from vCOPS because from its perspective there was no bottleneck, so the VMs were still green. You could find yourself in the same position if a VM was powered off accidentally – while it’ll show as grey in the console you won’t really be aware of it due to the lack of notification!
I mentioned earlier that vCOPS relies on the metrics it extracts from vCenter, but that has it downsides. Our standard platform is still ESX v4.01 which means we don’t get NFS storage monitoring as it was one of the new features of vSphere v4.1. This meant our vCOPS wasn’t able to ‘see’ one of the big four (CPU, memory, storage, network) which resulted in some performance issues being missed. This isn’t a shortcoming of the product but it’s something worth bearing in mind.
There is no ability to export reports beyond taking screenshots. This means the tool may only seen by the administrators – management will get no visibility into what if offers. This increases the ‘cost’ of maintaining infrastructure because I’m going to have to spend time producting metrics for management.
A few other limitations which frustrated me;
- Doesn’t scale well with Linked Mode – you need a separate instance per vCenter which also means multiple dashboards, reporting, administration etc.
- You can’override or clear an alert manually. If you make a scheduled change (say you consolidate your ESX hosts to save licencing, which then results in higher utilisation) it’ll take a while to calculate a new baseline, during which time you’ll get red status. Not ideal if you have multiple teams as you have to keep saying ‘ignore that for the time being’. This undermines belief in the product (it’s a bit like ‘cry wolf’) and I worry it’ll start to be ignored if this happens too often.
- I couldn’t find any way to delegate different views to a different team. Maybe that’s asking to much from both an entry level version and a ‘first’ release but for any medium sized company this is often a desired feature.
As noted by others vCOPS is not a cheap product and it’s competing in a crowded space. It’s priced on a per VM basis (rather than per socket) and bought in batches of 25 – in the UK list price is around £925/25VMs, plus support and subscription. For an infrastructure of 750 VMs it’s therefore over £30,000 and while it’s possible to partially license your infrastructure (see these VMware community posts here and here) it’s an inelegant workaround. At that price you have to wonder whether buying another few hosts to minimise contention might work out cheaper and provide better value for your company (that may not help if I/O is your limiting factor obviously!).
NOTE: For some reason my vCOPS install only shows only 253 VMs as licensed despite not putting any restrictions in place. I’m not sure if it’s affecting the product’s accuracy or is simply an oversight in the licensing but something to watch out for!
The automated nature of vCOPS is it’s distinguishing feature but it’s not alone even in that niche marketplace. Netuitive have been around for a few years and offer a similar (and I believe more widely adopted) solution. I think Netuitive are aiming at the Enterprise market only (3000VMs+) but unfortunately I was unable to find pricing information (nothing on their website, no response to my phonecall/voicemail, not even a response to my twitter enquiry. FAIL!) so I can’t comment.
Despite not being an ‘apples to apples’ match I also costed up Veeam’s Monitor and vKernel’s vOPS Performance Analyzer for my infrastructure (both list prices with no discounts including one year’s support);
- Veeam Monitor v5 = 50 dual socket hosts x £190/socket = £19000
- vKernel vOPS Performance Analyzer v4 = 50 dual socket hosts x £185/socket = £18500
- VMware vCenter Operations = 32 x £925/25VMs = £29600 + £7398 (support @ £231/25VMs) = £36998
As you can see the figures are pretty stark! These products are NOT an exact match (they don’t have the same focus on intelligent algorithms) so doing a direct price comparison is disingeneous but they do compete in the same space and offer some of the same functionality.
Obviously these two alternative products use a different pricing model, sticking with the per socket costing previously used by most VMware products. This means that if you’re running a very low consolidation ratio (and hence many hosts) these solutions could theoretically be more expensive. See this blogpost from VKernel on why they dislike per VMs licencing. Obviously a common aim of solutions like these is to ensure maximum usage of the underlying hosts resulting in a higher consolidation ratio – which will cost more to licence….
To be fair it’s a first release and I’m sure it’ll improve with future releases. This post might come across as negative because I found quite a few features missing but my overall opinion is largely positive. As an administrator I’d like to have this tool available to me, but I don’t control the purse strings. We have a plethora of monitoring solutions at my place right up to the big boys like Oracle Enterprise Manager – that certainly does many of the things missing in vCOPS but the price tag is in another league altogether and you need specialist skillsets to maintain it! Speak to your VMware account manager and see what deals are on offer – at list price it may not compete so well but who pays list price?
If you’re not convinced you can check out these alternative solutions but first you should read this summary of monitoring solutions from Bernd Harzog so you know what you’re comparing against;