Sunday, December 13, 2009

ColdFusion shared scopes and race conditions

In a recent Google Group thread discussing the Singleton design pattern, Phillip Senn asked whether the application scope needs to be locked in onApplicationStart. Ray Camden answered with the following:
You don't need to lock if you let them run "normally." If you run the methods manually (many people will add a call to onApplicationStart inside their onRequestStart if a URL param exist) then you may need a lock. Of course, you only need a lock if you actually care about a race condition. 99% of the code I see in application startup just initializes a set of variables.
Ray's comment about not needing to lock when onApplicationStart is invoked implicitly during the application start event is completely correct. However, things get a little more complicated when you directly invoke the same method. Not only does it create race conditions, but the race conditions it creates are often not easily solved by cflock.

Let me explain with a story. Once upon a time a CF coder wrote some application initialization code that looked like this:
application.settings = {};
application.settings.title = "My application";
application.settings.foo = 42;
application.settings.bar = 60;
application.settings.baz = false;
To allow these settings to be reloaded, the coder added the following to onRequestStart:
<cfif StructKeyExists(url,"init") and url.init eq "abracadabra">
<cfset onApplicationStart() />
</cfif>
This worked fine in development and testing and so was deployed to production.

The application was successful and its usage grew. The original coder moved on to other projects and the maintenance was handed off to a new coder. Many months later the new coder noticed something. Occasionally, when the maintainer used the init=abracadabra parameter to reset the application variables, he would see errors pop up in the error log at the same time. The errors looked something like this:
Element BAR is undefined in APPLICATION.SETTINGS
The source of the error was not isolated to a single place in the code; it occurred in many different places in the application. The element also changed: sometimes it was BAR, other times it was BAZ or FOO or TITLE. The only thing in common to all the errors was APPLICATION.SETTINGS.

The coder couldn't see any pattern to the errors and so wrote it off as an anomaly; either an obscure bug in ColdFusion or some server misconfiguration. The errors continued but not frequently enough for end-users to really notice and complain about it.

Some time later, a project to add enhancements was approved for the application and another coder was brought in to assist the maintainer. The new coder is a bit of a "guru" and knew something about race conditions. When the code guru saw the strange errors he quickly identified the source of the problem: the initialization code contained a race condition.

By initializing the application.settings variable with an empty structure, there was a short period of time when the structure did not contain the title, foo, bar, or baz members. This is not a problem during the application start event because the event occurs before any normal request processing, but it is a problem if the same code invoked directly. If any other request thread tries to access one of those members after the re-init has created the structure but before it has assigned a value to that member, an error will occur.

So how did the guru fix the race condition? The naïve approach would be to simply put a cflock around the call:

<cfif StructKeyExists(url,"init") and url.init eq "abracadabra">
<cflock scope="application" type="exclusive" timeout="60">
<cfset onApplicationStart() />
</cflock>
</cfif>

However, to making this work would also require a read lock around every single access to the application.settings structure! A painful approach to say the least.

The guru knew a better way. He made some simple changes to the initialization code:

var settings = {};

settings.title = "My application";
settings.foo = 42;
settings.bar = 60;
settings.baz = false;

application.settings = settings;

Voila! By initializing the settings structure in a local variable and only assigning it to the application scope after it is fully initialized, the original race condition disappeared. No locking required!

The fix was deployed and the no trace of the error was ever found again.

Some time later, another coder noticed that he was seeing a similar problem in his application, except his errors were in the session scope and they occurred much more frequently.

The CF guru rolled his eyes. "Here we go again," he thought to himself...


6 comments:

  1. ah, and this looks like a CF9 friendly way of doing it too.

    ReplyDelete
  2. This is a great post. I love how stories can help get a point across!

    ReplyDelete
  3. A very nice approach. If you were going to do the same thing on the Application scope itself, I suppose you could just use structAppend(). This would still leave some threading issues, but would cut down significantly.

    ReplyDelete
  4. Another example of specialist knowledge is what Sean pointed out in the original "ColdFusion Design Patterns site - Singleton" thread. Variables that you don't think of as being shared can be, and therefore no longer thread safe.

    As you guys are CF gurus it's easy enough to remember this stuff and plan for it, but as a member of a team of developers with different skill levels and different levels of caring, it's too easy for the specialist knowledge to not be present.

    As a CF programmer I want to see CF thrive and having popular sites leave CF *because* they became popular and were using "good enough" code puts a downer on my CF appreciation.

    ReplyDelete
  5. Dang would like to read this article, but can't seem to get the Read More to expand!

    ReplyDelete