Monday, August 16, 2010

12 Ways Software Errors differ from Spelling Errors

[Note: I posted the following on my work team's blog earlier today. I started off with only 7 ways at noon, but came up with 5 more by the end of the day.]

Spelling errors are a simple and often humorous form of error that seems to be popular for introducing the topic of production errors. However I find that spelling errors and software errors differ in some very important ways.

The following is my own list of important differences between software errors and spelling errors.
  1. Software errors are usually dynamic (the software does something wrong) while spelling errors are always static (the spelling is wrong).
  2. Spelling is unambiguous: either a word is in the dictionary, or it is not.
  3. Spelling errors are trivial to reproduce.
  4. Once identified, spelling errors are usually easy to recognize as such.
  5. If different stakeholders disagree on the spelling of a word, you can resolve the conflict by going to the dictionary.
  6. You can usually ignore requests from a customer to spell a word incorrectly (after politely referring them to a dictionary, of course).
  7. The cost of correcting a spelling error in a draft document does not increase with the amount of time it spent there.
  8. After carefully correcting a spelling error, you can be supremely confident that the spelling of that one word is now correct.
  9. Repeated misspellings of the same word can be easily corrected with the "Replace All" word processor feature.
  10. It is easy to correct a spelling error without creating more spelling errors.
  11. Correcting a single spelling error does not cause a collection of seemingly unrelated spelling errors to vanish without a trace.
  12. There are many spell checking programs available to detect all your spelling errors for you. Software defect detection, however, is an undecidable problem: there cannot ever exist a program that will detect all software defects.
Can you think of any others? Post them in the comments!

P.S. This blog post has been certified by a spell checker as 100% free of spelling errors :-)

Saturday, February 13, 2010

Excellence is a better goal than best

One aphorism frequently cited in English is "Perfect is the enemy of the good". It's basic message is clear: perfectionism may initially result in improvements but it eventually becomes counterproductive. Unfortunately it is often misused in the workplace as an excuse for mediocrity: it becomes another way of saying "'good enough' is good enough".

Apart from the above-mentioned misuse, the truth of the saying seems fairly uncontroversial. After all, in a world of mortals perfection is never truly attainable. However a few months ago I decided to look for the origins of this saying and discovered a hidden controversy behind it, and with it a more subtle and deeper truth.

A Google search revealed that the quote originated from the French philosopher Voltaire. As you might expect, the English quote is a translation. The original words were "Le mieux est l'ennemi du bien." I do not know French, but apparently the words "le mieux" (translated as "perfect" in the popular aphorism) is more literally translated as "the better" or "the best". As is common in such situations, debates abound over Voltaire's intended meaning of his words and which translation most accurately conveys that meaning.

I prefer the translation "The best is the enemy of the good". It may sound very similar to the common translation, but there is a subtle difference. While perfection is clearly unattainable, "the best" appears to be more within reach. However it is this attainability that makes it more dangerous than perfectionism.

When we strive for quality (good), we frequently encounter choices on how to proceed. At the beginning it is usually easy to identify the "better" choice, however as progress is made this becomes more difficult. The difficulty arises because at higher quality levels the choices tend to trade off one aspect of quality for another, so value judgments must be made as so which trade-off is the "better" one to make. Such value judgments are by their nature subjective and if there is disagreement a lot of energy can be expended in resolving the differences of opinion. In my experience, such disagreements almost always occur before "the best" is reached.

So if mediocrity is unacceptable, perfection is unattainable, and "the best" is too costly, what should we aim for? My answer is excellence. Once we learn to recognize excellence in our own work and the work of our peers, it becomes more productive to stop at "excellent" than to continue towards "best".

All this hit me recently while working on Model-Glue. I have been contributing to the Model-Glue project for a few months now and was recently accepted as a member of the development team. The two members I have been working the most with are Dan Wilson and Ezra Parker. Both are skilled coders and I am grateful for them giving me the chance to be the worst in their band.

We are working on finalizing a maintenance release and Ezra was assigned to fix an outstanding bug. He implemented a proposed fix and posted it with a request for feedback. I was a bit busy at the time so it was a few days later before I replied with an alternative approach that I had in mind that addressed a concern he had. I also offered to implement my idea if he were interested. In his response to my reply Ezra said:
If you'd like to create another patch for this ticket, feel free. I'm all for going with whatever fix works best, so if your idea turns out to be the better one, then we should probably implement it. That said, I performed some load testing tonight, and I'm not currently convinced that the attached patch causes any decrease in real-world performance.
Ezra wrote a good implementation, created unit tests for it which his fix passed, and performed load testing that confirmed that his fix offered good performance. In short, his work was excellent.

Sorry Ezra, but I must disagree with you on going with whatever fix works best. I've decided to not implement my idea. Why should I mess with your excellence?

PS. Excellence is not always the best or most important goal. Less is often a more important goal than excellence. Sometimes worse is better. And as strange as it may sound, sometimes sucks is a better release point than excellent (an idea Google seems to have embraced).

Wednesday, January 6, 2010

Thread-safety of integer counters in ColdFusion

I was a quite floored this morning when I discovered that Ben Nadel had taken some comments I made regarding thread safety on his blog post on ColdFusion 9 caching and used it to write an entirely new blog post on the AtomicInteger Java object.

In the first post, Ray Camden commented that he thought there might be a race condition in the array loop in Ben's code sample. I later chimed in, expressing my opinion that the race condition was not in the array loop but in the ID generator, and the latter could be fixed with an AtomicInteger object. Ben followed up by saying a named cflock would also work, and I replied that it would interesting to compare the performance of the two techniques.

Any of you who are followers of Ben Nadel's blog know that he is an irredeemable empiricist. He rarely accepts theories on how programs work without first testing them with his own experiments. His intellectual curiosity and honesty makes his blog a joy to read, and I find there is always something to be learned from his experiments.

Apparently my final comment on Ben's caching post caught his attention, as he made it into the topic of a new blog post on AtomicInteger. It was awesome to see him put enough thought into the topic to whip up a performance test and then share his thoughts on the topic and the test results on his blog.

While I enjoyed reading his new blog post, I decided that his experiment needed to be taken a little further. There were two issues that Ben's experiment did not address:
  1. The experiment was single-threaded, so it did not test the performance of the counters when shared by multiple threads;
  2. The experiment did not compare the thread-safety of the two approaches against the control case (no locking).
Ben had already done the heavy lifting, so all I had to do was add the thread code and the no-locking case. I changed the tests so that each test created 10 threads, and each thread incremented the shared counter 100,000 times. The correct final result of each test would then be 10 * 100,000 = 1,000,000. However if the counter experienced any race conditions, some of the increments would be lost and the final result would be less than 1,000,000.

Since blogger.com does not offer a good way to include long code snippets in a blog post, I decided to publish the code on pastebin. You can check it out on http://cfm.pastebin.com/f18eee642

The performance part of my new test matched Ben's results:

Named CFLOCK Test: 22,807 ms
AtomicInteger Test: 2,403 ms
No-Locking Test: 2,574 ms

I was a little surprised that the no-locking test was slightly slower than the AtomicInteger test. I can only guess that AtomicInteger offers a speed benefit over ColdFusion's ++ operator that outweighs its thread-safety overhead.

The thread-safety part of my new test, on the other hand, completely blew my mind:

Expected final counter value: 1,000,000
Named CFLOCK final counter value: 1,000,000
AtomicInteger final counter value: 1,000,000
No-Locking final counter value: 497,246

The final counter value of the thread-unsafe test was half that of the thread-safe tests. That means that with 10 concurrent threads, approximately 50% of the shared counter increments were lost due to race conditions!!! It turns out experiencing race conditions with the ColdFusion ++ operator was much more likely than what I originally thought. Empirical testing FTW!

Thanks again to Ben Nadel for inspiring me to take this investigative journey and share it with others.