Late last week I joined an illustrious line of community bloggers, vendors, and authors by having a ‘chinwag’ with Mike Laverick. Anyone who knows Mike knows that a quick chat can easily last an hour for all the right reasons – he’s passionate about VMware and technology in general and good at presenting complex ideas in an easily understood manner. I guess that’s why he recently became a senior cloud evangelist for VMware! We discussed a few topics which are close to my heart at the moment;
Storage Field Day
You can listen to the audio (MP3 or the iPod/iPad friendly M4V) or watch the YouTube video. As time is limited on the actual chinwag I thought I’d offer a few additional thoughts on a couple of the topics we discussed.
Oracle and converged infrastructure
I didn’t want to get embroiled in a discussion about Oracle’s support stance on VMware as that’s been covered many times before but it’s definitely still a barrier. Some of our Oracle team have peddled the ‘it’s not supported’ argument to senior management and even though I’ve clarified the ‘supported vs certified’ distinction it’s a difficult perception to alter. Every vendor wants to push their own solutions so you can’t blame Oracle for wanting to push their own solution but it sure is frustrating!
Of more interest to me is where converged infrastructure is going. As we discussed on the chinwag Oracle are an interesting use case for converged infrastructure (or engineered systems, pick your terminology of choice) because it includes the application tier. Most other converged offerings (VCE, FlexPod, vStart and even hyperconverged solutions like Nutanix) tend to stop at the hypervisor, thus providing a abstraction layer that you can run whatever workload you like on. Oracle (with the possible exception of IBM?) may be unique in owning the entire stack from hardware all the way up through storage, networking, compute, through to the hypervisor and up to their crown jewels, the Oracle database and applications. This gives them a position of strength to negotiate with even when certain layers are weak in comparison to ‘best of breed’, as is the case with OracleVM. Archie Hendryx explores this in his blogpost although I think he undersells the advantage Oracle have of owning a tier 1 application – Dell’s vStart or VCE’s vBlock may offer competition from an infrastructure perspective but my company don’t run any Dell or VCE applications. If you’re not Oracle how do you compete with this? You team up to provide a ‘virtual stack’ optimised for various workloads – today VDI is the most common (see reference architectures from Nexenta, Nimble Storage et al). As the market for converged infrastructure grows I think we’ll see more of these ‘vertical’ stack style offerings.
After I described my problem getting vCD tabled as a viable technology for lab management Mike rightly pointed out that many people are using vCD in test and dev – maybe more than in production. I agree with Mike but suspect that most are using dev/test as a POC for a production private cloud, not as purpose built lab management environment. I didn’t get time to discuss a couple of other points which both complicate the introduction of vCD even if you have an existing VMware environment;
Introducing vCD (or any cloud solution for that matter) is potentially a much bigger change compared to the initial introduction of server virtualisation. In the latter the changes mainly impacted the infrastructure teams although provisioning, purchasing, networks and storage were all impacted. If you’re intending to deliver test/dev environments you’re suddenly incorporating your applications too, potentially including the whole development/delivery lifecycle. If you go the whole hog to self-service then you potentially include an even larger part of the business right up to the end users. That’s a very disruptive change for some ‘infrastructure guy’ to be proposing!
vCD recommends Enterprise+ licencing which means I have to argue for the highest licencing level for test/dev, even if I don’t have it in production
It’s that time of year when I book the next London VMUG session into my calendar and rather than my usual ‘here’s the agenda, you should go‘ blogpost I thought I’d recap what the last year has delivered. If this doesn’t convince you that there’s value in attending a free event where you could have learnt all the topics listed below as well as networking with your peers then nothing will.
If there’s a topic you’d like covered or if you’d like to present something yourself get in touch with the organising commmittee. I’m planning to present at one of next year’s VMUG sessions (it’s about time!) because it’s a user group and real world experience can be gold dust for others to learn from. I’m told we’re a friendly audience!
A thought provoking session on DevOps and service management from Alex Smith. This topic was probably out of many people’s comfort zones but it’s good to learn something new and challenge your thinking.
I could mention the giveaways (iPad, Fusion-IO card, t-shirts, AppleTV etc) and the free beers afterwards, the fact we had at least five VCDX’s presenting and the live labs from EMC, VMTurbo, and Embotics etc but you’re already sold right?
In this article I’m going to talk about Zerto, a data protection company specialising in virtualized and cloud infrastructures who I recently saw as part of Storage Field Day #2. They’ve presented twice before at Tech Field Days (as part of their launch in June 2011 and Feb 2012) so I was interested to see what new developments (if any) were in store for us. In their own words;
Zerto provides large enterprises with data replication solutions designed specifically for virtualized infrastructure and the cloud. Zerto Virtual Replication is the industry’s first hypervisor-based replication solution for tier-one applications, replacing traditional array-based BC/DR solutions that were not built to deal with the virtual paradigm.
When I first heard the above description I couldn’t help but think of VMware’s SRM product which has been available since June 2008. Zerto’s carefully worded statement is correct in that SRM relies on storage array replication for maximum functionality but I still think it’s slightly disingeneous. To be fair VMware are equally disingeneous when they claim “the only truly hypervisor level replication engine available today” for their vSphere Replication technology – marketing will be marketing! Later in this article I’ll clarify the differences between these products but let’s start by looking at what Zerto offers.
Zerto offer a product called Zerto Virtual Replication which integrates with vCenter to replicate your VMware VMs to one or more sites in a simple and easy to use manner. Since July 30th 2012 when v2.0 was released it supports replication to various clouds along with advanced features such as multisite replication and vCloud Director compatibility. Zerto are on an aggressive release schedule given that the initial release (which won ‘Best of Show’ at VMworld 2011) was only a year earlier but in a fast moving market that’s a good thing. For an entertaining 90 second introduction which explains what if offers better than I could check out the video below from the companies website;
Just as server virtualization opened up possibilities by abstracting the guest OS from the underlying hardware so data replication can benefit from moving ‘up the stack’ away from the storage array hardware and into the hypervisor. The extra layer of abstraction lifts certain constraints related to the storage layer;
Array agnostic – you can replicate between dissimilar storage arrays (for example Netapp at one end and EMC at the other). For both cloud and DR scenarios this could be a ‘make or break’ distinction compared to traditional array replication which requires similar systems at both ends. In fact you can replicate to local storage if you want – if you’re one of the growing believers in the NoSAN movement that could be useful…
Storage layout agnostic – because you choose which VMs to replicate rather than which volume/LUN on the array you’re less constrained when designing or maintaining your storage layout. When replicating you can also change between thin and thick provisioning, or from SAN to NAS, or from one datastore layout to another. A typical use case might be to replicate from thick at the source to thin provisioning at the DR location for example. There is a definite trend towards VM-aware storage and ditching LUN constraints – you see it with VMware’s vVols, storage arrays like Tintri and storage hypervisors like Virsto so having the same liberating concept for DR makes a lot of sense.
Zerto goes further than just being ‘storage agnostic’ as it allows further flexibility;
Replicate VMs from vCD to vSphere (or vice versa). vCD to vCD is also supported. This is impressive stuff as it understands the Organization Networks, vApp containers etc and creates whatever’s needed to replicate the VMs.
vSphere version agnostic – for example use vSphere 4.1 at one end and vSphere 5.0 at the other. For large companies which can typically lag behind this could be the prime reason to adopt Zerto.
With any replication technology bandwidth and latency are concerns as is WAN utilisation. Zerto uses virtual appliances on the source and destination hosts (combined with some VMware API calls, not a driver as this article states) and therefore isn’t dependent on changed block tracking (CBT), is storage protocol agnostic (ie you can use FC, iSCSI or NFS for your datastores) and offers compression and optimisation to boot. Zerto provide a profiling tool to ‘benchmark’ the rate of change per VM before you enable replication, thus alllowing you to predict your replication bandwidth requirements. Storage I/O control (SIOC) is not supported today although Zerto are implementing their own functionality to allow you to limit replication bandwidth. Today it’s done on a ‘per site’ basis although there’s no scheduling facility so you can’t set different limits during the day or at weekends.
VMware’s vSphere is the only hypervisor supported today although we were told the roadmap includes others (but no date was given). With Hyper-V v3 getting a good reception I’d expect to see support for it sooner than later and that could open up some interesting options.
Zerto’s Virtual Replication vs VMware’s SRM
Let’s revisit that claim that Zerto is the “industry’s first hypervisor-based replication solution for tier-one applications“. With the advent of vSphere 5.1 VMware now have two solutions which could be compared to Zerto – vSphere Replication and SRM. The former is bundled free with vSphere but is not comparable – it’s quite limited (no orchestration, testing, reporting or enterprise-class DR functions) and only really intended for data protection not full DR. SRM on the other hand is very much competition for Zerto although for comparable functionality you require array level replication.
When I mentioned SRM to the Zerto guys they were quick to say it’s an apples-to-oranges comparison which to a point is true – with Zerto you specify individual or groups of VMs to replicate whereas with SRM you’re still stuck specifying volumes or LUNs at array level. Both products have their respective strengths but there’s a large overlap in functionality and many people will want to compare them. SRM is very well known and has the advantage of VMware’s backing and promotion – having a single ‘throat to choke’ is an attractive proposition for many. I’m not going to list the differences because others have already done all the hard work;
Looking through the comparisons with SRM there are quite a few areas where Zerto has an advantage although to put it in context check out the pricing comparison at the end of this article; NOTE: Since the above comparison was written SRM v5.1 has added support for vSphere Essentials Plus but everything else remains accurate
RTO in the low seconds rather than 15 mins
Compression of replication traffic
No resync required after host failures
Cloning of the DR VMs for testing
Point in time recovery (up to a max of 5 days)
The ability to flag a VMDK as a pagefile disk. In this instance it will be replicated once (and then stopped) so that during recovery a disk is mounted but no replication bandwidth is required. SRM can’t do this and it’s very annoying!
vApps supported (and automatically updated when the vApp changes)
vCloud Director compatibility
If you already have storage array replication then you’ll probably want to evaluate Zerto and SRM. If you don’t have (or want the cost of) array replication or want the flexibility of specifying everthing in the hypervisor then Zerto is likely to be the best solution.
DR to the Cloud (DRaaS)
Of particular interest to some customers and a huge win for Zerto is the ability to recover to the cloud. Building on the flexibility to replicate to any storage array and to abstract the underlying storage layout allows you to replicate to any provider who’s signed up to Zerto’s solution. Multisite and multitenancy functionality was introduced in v2.0 and today there are over 30 cloud providers signed up including some of the big guys like Terremark, Colt, and Bluelock. Zerto have tackled the challenges of a single appliance (providers obviously wouldn’t want to run one per customer) providing secure multi-tenant replication with resource management included.
Often this is what it comes down to – you can have the best solution in the market but if you’re charging the most then that’s what people expect. Zerto are targeting the enterprise so maybe it shouldn’t be a surprise that they’re also priced at the top end of the market. The table below shows pricing for SRM (both Standard and Enterprise edition) and Zerto;
Zerto Virtual Replication
$195 per VM
$495 per VM
$745 per VM
As you can see Zerto costs a significant premium over SRM. When making that comparison you may need to factor in the cost of storage array replication as SRM using vSphere Replication is severely limited. These are all list prices so get your negotiating hat on! We were told that Zerto were seeing good adoption from all sizes of customer from 15VMs through to service providers.
I’ve not used SRM in production since the early v1.0 days and I’ve not used Zerto in production either so my thoughts are based purely on what I’ve read and been shown. I was very impressed with Zerto’s solution which certainly looks very polished and obviously trumps SRM in a few areas – hence why I took the time to investigate and write up my findings in this blogpost. From a simple and quick appliance based installation (which was shown in a live demo to us) through to the GUI and even the pricing model Zerto’s aim is to keep things simple and it looks as if they’ve succeeded (despite quite a bit of complexity under the hood). If you’re in the market for a DR solution take time to review the comparison with SRM above and see which fits your requirements and budget. Given how comprehensive the feature set is I wouldn’t be surprised to see this come out on top over SRM for many customers despite VMware’s backing for SRM and the cost differential.
Multi-hypervisor management could be a ‘killer feature’ for Zerto. It would distinguish the product for the forseeable future (I’d be surprised to see this in VMware’s roadmap anytime soon despite their more hypervisor friendly stance) and needs to happen before VMware bake comparable functionality into the SRM product. Looking at they way VMware are increasingly bundling software to leverage the base vSphere product there’s a risk that SRM features work their way down the stack and into lower priced SKU’s – good for customers but a challenge for Zerto. There’s definitely intriguing possibilities though – how about replicating from VMware to Hyper-V for example? As the use of cloud infrastructure increases the ability to run across heteregenous infrastructures will become key and Zerto have a good start in this space with their DRaaS offering. If you don’t want to wait and you’re interested in multi-hypervisor management (and conversion) today check out Hotlink (thanks to my fellow SFD#2 delegates for that tip).
I see a slight challenge in Zerto targeting the enterprise specifically. Typically these larger companies will already have storage array replication and are more likely to have a mixture of virtual and physical and therefore will still need array functionality for physical applications. This erodes the value proposition for Zerto. Furthermore if you have separate storage and virtualisation teams then moving replication away from the storage array could break accepted processes not to mention put noses out of joint! Replication at the storage array is a well accepted and mature technology whereas virtualisation solutions still have to prove themselves in some quarters. In contrast VMware’s SRM may be seen to offer the best of both worlds by offering the choice of both hypervisor and/or array replication – albeit with a significantly less powerful replication engine (if using vSphere Replication) and with the aforementioned constaints around replicating LUNs rather than VMs. Zerto also have the usual challenges around convincing enterprises that as a ‘startup’ they’re able to provide the expected level of support – for an eloquent answer to this read ‘Small is beautiful’ by Sudheesh Nair on the Nutanix blog (who face the same challenges).
Disclosure: the Storage Field Day #2 event is sponsored by the companies we visit, including flight and hotel, but we are in no way obligated to write (either positively or negatively) about the sponsors.
While upgrading my home lab recently I found myself reconsidering the scale up vs scale out argument. There are countless articles about building your own home lab and whitebox hardware but is there a good alternative to the accepted ‘two whiteboxes and a NAS’ scenario that’s so common for entry level labs? I’m studying for the VCAP5-DCD so while the ‘up vs out’ discussion is a well trodden path there’s value (for me at least) in covering it again.
There are two main issues with many lab (and production) environments, mine included;
Memory is a bottleneck and doubly so in labs using low end hardware – the vCentre appliance defaults to 8GB, as does vShield Manager so anyone wanting to play with vCloud (for example) needs a lot of RAM.
Affordable yet performant shared storage is also a challenge – I’ve used both consumer NAS (from 2 to 5 bays) and ZFS based appliances but I’m still searching for more performance.
When I built my newest home lab server, the vHydra I used a dual socket motherboard to maximise the possible RAM (up to 256GB RAM) and used local SSDs to supplement my shared storage. This has allowed me to solve the two issues above – I have a single server which can host a larger number of VMs with minimal reliance on my shared storage. The concepts are the same as solutions like Fusion-IO aim to do in production environments but mine isn’t particularly scalable. In fact it doesn’t really scale at all – I’ll have to revert to centralised storage if I buy more servers. Nor does it have any resilience – the ESXi server itself isn’t clustered and the storage is a single point of failure as there’s no RAID. It is cheap however, and for lab testing I can live with those compromises. None of this is vaguely new of course – Simon Gallagher’s vTardis has been using these same concepts to provide excellent lab solutions for years. Is this really a poor man’s Fusion-IO? There’s nothing like the peformance and nothing like the budget but the objectives are the same but to be honest it’s probably a slightly trolling blog title. I won’t do it again. Promise!
If you’re thinking of building a home lab from scratch consider buying a single large server with local SSD storage instead of multiple smaller servers with shared storage. You can always scale out later or wait for Ceph or HDFS to elimate the need for centralised storage at all…
Tip: It’s worth bearing in mind the 32GB limit on the free version of ESXi – unless you’re a vExpert or they reinstate the VMTN subscription you’ll be stuck with 60 day eval editions if you go above 32GB (or buying a licence!).
I recently had to complete an external audit of our VMware estate and thought it might be useful to others to know what the process entails, what you’ll need to provide to the auditors, and a few issues that I wasn’t aware of beforehand around licencing compliance. The initial approach by the auditor will describe the overall process and expected timelines (which will vary based on the size of your company).
There are two main steps in the process – self disclosure and discovery;
Self disclosure is where you detail your use of VMware software including vCenters, ESX/ESXi hosts, VMs, and licences. In our case this was collated into an Excel spreadsheet provided by the auditor (the deployment detail workbook). You’ll also have to answer some high level questions about your company (such as how many locations you have), how you audit internally (how you track licences – third party tools, vCenter etc), when you initially deployed VMware in your company, and some info about your contacts for the audit. How you collect this information is up to you but there are a couple of good choices;
Export data from vCenter using the GUI
Export date from vCenter using PowerCLI scripts
Use third party tools.
I used a mixture of RVTools (which is a handy and free download) and PowerCLI scripts. The native ‘Export’ feature in vCenter isn’t very flexible (there’s no way to export all the MAC addresses of VMs for example) but while RVTools came close it didn’t provide everything I needed either. I needed host uptime and while RVTools does show the last reboot time I still needed to translate that into days plus it didn’t cover licencing for each host (which I could have got from vCenter). I’ve included the script I ran at the end of this post in case it’s of use to someone else.
Validation. Once the disclose is completed the auditor will want to ‘validate’ the information – auditor talk for “are you telling the truth, the whole truth, and nothing but the truth?”! This can be done in a variety of ways depending on the size of your estate, location, the auditor etc. It could include using your inhouse auditing tools (Centennial for example), data from directories like Active Directory or a scan of your network switches for a list of VMware MAC addresses (prefixes 00.05.69, 00.0C.29, 00.1C.14, as well as the more commonly known 00.50.56) . The latter was the approach we took due to a mixed Linux/Windows estate and the auditors preference. NOTE: you’ll do the actuall collection of all data not the auditors, even if they’re onsite.
In an ideal world the information collected in this step matches up nicely with the information you’ve disclosed – any discrepancies will need investigating and explaining. A few things that caught me out here;
Ensure you keep track of any changes to the VMware environment after the audit process kicks off (this is an audit requirement). Some of my discrepancies were because another admin had decommissioned some VMs after my initial disclosure so they flagged up as ‘missing’. Simple to explain, but time consuming to track down! This could be a real challenge in a larger environment.
Remember that VMkernel ports also have VMware MAC addresses, not just the VMs. I spent a while trying to find ‘phantom’ VMs before tracking down the issue. RVTools shows these in a seperate tab so you’ll need to export both.
Even if you’re over entitled (you have more licences than you’re using) you’ll probably have to justify it, just to be sure you’re not hiding some part of your installation.
During my recent build of the vHydra server I found myself rather frustrated with Supermicro for a couple of reasons.
Firstly their UK distribution doesn’t seem to be working particularly well as there’s a two week wait for most parts which are apparently shipped from the US on demand. There are UK based resellers (I tried www.boston.co.uk) but even then some parts still have a long lead time (around a week) and I found them to be expensive compared to alternative web based vendors.
Secondly their technical support was somewhat lacking. Once I’d built the server I found I was getting an overvoltage warning on the second, empty, CPU socket. As I was planning on populating this socket (once the second CPU and heatsink arrived, another three weeks wait ) I was keen to know if this was a false positive or whether the board should be returned as faulty.
I emailed Supermicro technical support who went through the usual information gathering – firmware, BIOS, motherboard details etc. They identified that the IPMI firmware was out of date Read more…
Having recently upgraded my home lab’s storage I decided it was also time to upgrade my aging hosts which date back to 2007. They’ve done well to survive and still be useful(ish) five years later but they’re maxed out at 8GB RAM and it’s becoming increasingly difficult to do anything with that. I briefly considered adding SSDs as host cache but that doesn’t address some of their other shortcomings such as no support for Fault Tolerance, VMDirectPath or any type of KVM functionality.
A quick look around the blogosphere revealed a few common options;
The problem for me was that these solutions all maxed out at 16 or 32GB RAM per host, a limitation of the single socket Xeon’s architecture. That’s a lot of memory for a home lab server today but to ensure that this server can last five years I really wanted more scalability. I wasn’t too fussed about noise as I use my cellar for my lab, and power consumption was a secondary concern. The server features of the Supermicro boards appeal to me (and many Supermicro motherboards are compatible with vSphere) so I browsed their range looking for the one that best met my requirements. My final parts list ended up as;
This year my VMworld experience started in a more relaxed fashion than previously as I flew in ahead of time on the Sunday night. After checking in to my hotel and getting my orientation in the city I headed (along with LonVMUG’s Luke Munro) to the vRockstar party at the Hard Rock Cafe organised by Marco Broeken and Patrick Redknapp. This coincided nicely with ‘El Classico’ when the two giants of Spanish football, Real Madrid and Barcelona, play each other in the Spanish league. This ensured the Hard Rock Cafe was rammed full so it was a good thing they’d reserved an area for us. Food, (free) drink, and good conversation – thanks for organizing a great start to VMworld guys!
Next day registration at the conference venue was very quick partly because it was partner day and the masses had yet to arrive. There was some misleading information about the HOL being closed although after a quick Twitter shoutout to John Troyer that was quickly remedied. As I’m a customer not a partner I didn’t have access to the partner breakout sessions so I figured my day was going to be a mixture of labs and people networking. Compared to Copenhagen the weather was a distinct improvement, hovering around 25 degrees and quite humid, although inside the air conditioning kept everyone cool.
The Keynotes and announcements
Tuesday signaled the first day of the main conference when all 7000 attendees turned up. The day started with the keynote from Pat Gelsinger and Steve Herrod and was largely a repeat of the US keynote with a few notable exceptions which I’ll cover later. For those that haven’t seen the US keynotes here’s the highlights;
there is a new vCloud Suite which bundles many of the VMware products together in a more compelling and cost effective package
vRAM is no more (cost is now per socket)
the launch of vSphere 5.1
new certification tracks including a vCloud track
VMware always like to hold back some product launches so that VMworld Europe has something to get excited about. Here’s a summary of the announcements at Barcelona;
With the swift integration of the Dynamic Ops technology VMware obviously want to manage heterogeneous clouds having spent the last five years saying there was no demand. Should we take this as indirect endorsement of Hyper-V?
If you’re in the market to take a VMware certification exam, there’s some good news – provided you’re quick. For the next couple of days (while VMworld Barcelona is running, Oct 9th-11th 2012) you can book the VCP and VCAP exams at a cool 50% off. For VCP that’s a saving of approx £50 and more like £200 for the VCAP exams! If you want to blitz some of the new certification tracks recently announced you’re not limited to just one – study your little legs off and you could save even more by taking multiple exams….
The codes you need to register with are;
VMWBAR50 – for the VCP exams (VCP-DV, VCP-DT,VCP-Cloud etc)
ADVBAR50 – for the VCAP exams (VCAP-DCA, VCAP-DCD etc)
You MUST book the exam while VMworld Barcelona is running. You don’t have to be attending the conference, it’s just the period of time the offer is valid.
You MUST take the exam by the end of the year.
What are you waiting for? Head over to VMware Certification and get registered certification junkies!
While working recently on an ADFS federation solution I came across a Microsoft ‘feature’ which doesn’t seem to be well known and which caused me to deliver my project a week late. It often manifests itself via failed logins and affects many products which integrate with AD such as Sharepoint, Office365, OWA, and of course ADFS. This is very much one of those ‘document it here for future reference’ posts but hopefully it’ll help spread the word and maybe save someone else the pain I felt!
To describe how the ‘feature’ affects ADFS you need to understand the communication flow when a federation request is processed. The diagram below (from an MSDN article on using ADFS in Identity solutions) shows a user (the web browser) connecting to a service (the ASP.NET application although it could be almost any app) which uses ADFS federation to determine access;
Communication flow using federated WebSSO
Summarising the steps;
The user browses to the web application (step 1)
The web app redirects the user to ADFS (step 2,3)
ADFS attempts to authenticate the user, usually against Active Directory (step 4)
ADFS generates a token (representing the users authentication) which is passed back to the user who then presents it to the app and is given access (steps 5,6,7)
My problem was that while some users were being logged into the web application OK, some were failing and I couldn’t work out why. Diagnosing issues in federation can be tricky as by its nature it often involves multiple parties/companies. The web application company were saying their application worked fine, both redirecting users and processing the returned tokens. The users were entering their credentials and being authenticated against our internal Active Directory. ADFS logs showed that tokens were being generated and sent to the web app. Hmm.
Digging deeper I found that the AD username (the UPN to be precise) being passed into the token generation process within ADFS was occasionally incorrect. The user would type their username into the web form (and be authenticated) but when ADFS tried to generate claims for this user via an LDAP lookup it used an incorrect UPN and hence failed. It seemed as if the Windows authentication process was returning incorrect values to ADFS. This stumped me for a while – how can something as simple and mature as AD authentication go wrong?
Of course it’s not going wrong, its working as designed. It transpires there’s an LSA cache on domain member servers. On occasions where the AD values have changed recently (the default is to cache for 7 days) it can result in the original, rather than the updated, values being returned to the calling application by the AD authentication process. A simple change such as someone getting married and having their AD account updated with their married name could therefore break any dependant applications. Details of this cache can be found in MS KB article 946358, along with the priceless statement “This behaviour may prevent the application from working correctly“. No kidding! This impacted my project more than most because the AD accounts are created programmatically via a web portal and updated later by some scripts. The high rate of change means they’re more susceptible to having old values cached.
This might seem like a niche problem but it also impacts implementations of Sharepoint, OWA, Project server, and Office365 – any product that relies on AD for authentication. These products can be integrated with AD to facilitate single sign on but if you make frequent changes to AD the issues above can occur.
How can I diagnose this issue?
The symptoms will vary between products but thankfully Microsoft have some great documentation on ADFS. The troubleshooting guide details how to enable the advanced ADFS logs via Event Viewer- when you’ve got those check for Event ID 139. The event details shows the actual contents of the authentication token so you can check the UPN and ensure it’s what you expect. If not follow the instructions in the KB article to disable or fine tune the cache retention period on the domain member server (ie the ADFS server, not the AD server).
These rants and raves are solely my opinion and do not reflect the opinions of my employers.
Any of my code, configuration references, or suggestions should be researched and verified in a lab environment before attempting in a production environment.
Agreement to use any of my code or recommendations removes me from any liability as such....and I shamelessly stole this disclaimer from Jase McCarty's site!