Federated login failures – the LSA cache
While working recently on an ADFS federation solution I came across a Microsoft ‘feature’ which doesn’t seem to be well known and which caused me to deliver my project a week late. It often manifests itself via failed logins and affects many products which integrate with AD such as Sharepoint, Office365, OWA, and of course ADFS. This is very much one of those ‘document it here for future reference’ posts but hopefully it’ll help spread the word and maybe save someone else the pain I felt!
To describe how the ‘feature’ affects ADFS you need to understand the communication flow when a federation request is processed. The diagram below (from an MSDN article on using ADFS in Identity solutions) shows a user (the web browser) connecting to a service (the ASP.NET application although it could be almost any app) which uses ADFS federation to determine access;
Summarising the steps;
- The user browses to the web application (step 1)
- The web app redirects the user to ADFS (step 2,3)
- ADFS attempts to authenticate the user, usually against Active Directory (step 4)
- ADFS generates a token (representing the users authentication) which is passed back to the user who then presents it to the app and is given access (steps 5,6,7)
My problem was that while some users were being logged into the web application OK, some were failing and I couldn’t work out why. Diagnosing issues in federation can be tricky as by its nature it often involves multiple parties/companies. The web application company were saying their application worked fine, both redirecting users and processing the returned tokens. The users were entering their credentials and being authenticated against our internal Active Directory. ADFS logs showed that tokens were being generated and sent to the web app. Hmm.
Digging deeper I found that the AD username (the UPN to be precise) being passed into the token generation process within ADFS was occasionally incorrect. The user would type their username into the web form (and be authenticated) but when ADFS tried to generate claims for this user via an LDAP lookup it used an incorrect UPN and hence failed. It seemed as if the Windows authentication process was returning incorrect values to ADFS. This stumped me for a while – how can something as simple and mature as AD authentication go wrong?
Of course it’s not going wrong, its working as designed. It transpires there’s an LSA cache on domain member servers. On occasions where the AD values have changed recently (the default is to cache for 7 days) it can result in the original, rather than the updated, values being returned to the calling application by the AD authentication process. A simple change such as someone getting married and having their AD account updated with their married name could therefore break any dependant applications. Details of this cache can be found in MS KB article 946358, along with the priceless statement “This behaviour may prevent the application from working correctly“. No kidding! This impacted my project more than most because the AD accounts are created programmatically via a web portal and updated later by some scripts. The high rate of change means they’re more susceptible to having old values cached.
This might seem like a niche problem but it also impacts implementations of Sharepoint, OWA, Project server, and Office365 – any product that relies on AD for authentication. These products can be integrated with AD to facilitate single sign on but if you make frequent changes to AD the issues above can occur.
How can I diagnose this issue?
The symptoms will vary between products but thankfully Microsoft have some great documentation on ADFS. The troubleshooting guide details how to enable the advanced ADFS logs via Event Viewer- when you’ve got those check for Event ID 139. The event details shows the actual contents of the authentication token so you can check the UPN and ensure it’s what you expect. If not follow the instructions in the KB article to disable or fine tune the cache retention period on the domain member server (ie the ADFS server, not the AD server).