In the first post, Ray Camden commented that he thought there might be a race condition in the array loop in Ben's code sample. I later chimed in, expressing my opinion that the race condition was not in the array loop but in the ID generator, and the latter could be fixed with an AtomicInteger object. Ben followed up by saying a named cflock would also work, and I replied that it would interesting to compare the performance of the two techniques.
Any of you who are followers of Ben Nadel's blog know that he is an irredeemable empiricist. He rarely accepts theories on how programs work without first testing them with his own experiments. His intellectual curiosity and honesty makes his blog a joy to read, and I find there is always something to be learned from his experiments.
Apparently my final comment on Ben's caching post caught his attention, as he made it into the topic of a new blog post on AtomicInteger. It was awesome to see him put enough thought into the topic to whip up a performance test and then share his thoughts on the topic and the test results on his blog.
While I enjoyed reading his new blog post, I decided that his experiment needed to be taken a little further. There were two issues that Ben's experiment did not address:
- The experiment was single-threaded, so it did not test the performance of the counters when shared by multiple threads;
- The experiment did not compare the thread-safety of the two approaches against the control case (no locking).
Ben had already done the heavy lifting, so all I had to do was add the thread code and the no-locking case. I changed the tests so that each test created 10 threads, and each thread incremented the shared counter 100,000 times. The correct final result of each test would then be 10 * 100,000 = 1,000,000. However if the counter experienced any race conditions, some of the increments would be lost and the final result would be less than 1,000,000.
Since blogger.com does not offer a good way to include long code snippets in a blog post, I decided to publish the code on pastebin. You can check it out on http://cfm.pastebin.com/f18eee642
The performance part of my new test matched Ben's results:
Named CFLOCK Test: 22,807 ms
AtomicInteger Test: 2,403 ms
No-Locking Test: 2,574 ms
I was a little surprised that the no-locking test was slightly slower than the AtomicInteger test. I can only guess that AtomicInteger offers a speed benefit over ColdFusion's ++ operator that outweighs its thread-safety overhead.
The thread-safety part of my new test, on the other hand, completely blew my mind:
Expected final counter value: 1,000,000
Named CFLOCK final counter value: 1,000,000
AtomicInteger final counter value: 1,000,000
No-Locking final counter value: 497,246
The final counter value of the thread-unsafe test was half that of the thread-safe tests. That means that with 10 concurrent threads, approximately 50% of the shared counter increments were lost due to race conditions!!! It turns out experiencing race conditions with the ColdFusion ++ operator was much more likely than what I originally thought. Empirical testing FTW!
Thanks again to Ben Nadel for inspiring me to take this investigative journey and share it with others.
HOLY COW! I am shocked at how far off the no-locking route was! Amazing.
ReplyDeleteDennis, thanks as always for the inspiration and for taking it to the next level. Crazy stuff!