Java News from Monday, January 3, 2005

One advantage of taking a week off, disconnected from your usual net sources, is that it helps you see things in new ways. I had a couple of minor revelations over the Christmas break. The first was that spam and worm droppings have become completely debilitating for dialup users (which I was temporarily for the week). The only practical solution is to filter on the server side rather than the client as I had been doing. The second revelation was that spam and Microsoft Outlook worms are really two separate problems and require two separate solutions.

Filtering the spam just required making some small changes in my spamassassin configuration here on IBiblio. However, spamassassin (and many other Bayesian filters) do a surprisingly bad job of handling worm droppings, especially content free e-mails that come with very small to non-existent message text accompanied by large attachments. You'd think this would be a red flag for spam filters, but it doesn't seem to be. Neither spamassassin, Mozilla, nor Eudora has ever reliably identified such e-mails as spam, despite extensive training.

For a long time I'd just been manually deleting these, but this past week that no longer proved feasible so I did a little research with Google and came up with vsnag, a procmail filter specifically tuned to weed out worm droppings and nothing else. This product is a god send! It almost completely eliminated the hundreds of direct Microsoft worm attacks and the bounce messages from worms that forged my e-mail address. This is a wonderful product! If your ISP isn't using it already, install it yourself. It's really, really worth it.

Besides struggling with dialup connections, each Christmas break I try to read at least one "important" book. Some years it's something technical like Bertrand Meyer's Object Oriented Software Construction (fabulous book, by the way, a classic of object oriented programming, and someday I'll get past chapter 7.) Some years it's a large novel like Cryptonomicon. This year I thought the book was going to be Douglas Hofstadter's Le Ton Beau de Marot, but it turned out to be How to Lie With Statistics.

This 142 page pamphlet is a brilliant exposition of just how warped most statistical "facts" really are. I knew all the math already, and I'd encountered all of these issues in the past, but I've never seen the issues put together in one place so concisely and entertainingly. The book is 50 years old, and some of the examples seem a little dated; but the basic points are as relevant today as they were a half-century ago. Almost every statistic you read is cherry picked to prove what the author wants to prove. To really understand what's going on in the world you must be conversant with the basic language of statistics, and must carefully evaluate each statistic to see what it really means. More often than not you'll find it doesn't come close to proving what the purporter claims it proves.

I doubt Cafe au Lait readers are afraid of a little math; but even if you are, don't worry. The math in this book is minimal, really nothing more than understanding the difference between mean, median, and mode; which are clearly explained, as well as why you should care about the difference. Just this morning I noticed a headline in the New York Times that's using the mean when it should be using the median. I wouldn't have caught this two weeks ago. You certainly don't need to know about Lebesgue measure, cumulative distribution functions, convolutions, or any of the other fancy stuff Prof. Bhattacharjee tried to stuff into my head in my one formal course in statistics. High school Algebra I is more than sufficient. This book should be required reading in every high school civics and freshman poli sci class in the country. If you haven't read it yet, read it, even if, like me, you think you know the math already. The math is easy, but the application of the math to gulling the public is astonishing. This is an exceptional book. If you haven't read it, don't wait. Read it before the next time you open a newspaper or listen to the nightly news. It will change the way you see the world.

David Hovemeyer and Bill Pugh have posted FindBugs 0.8.6, an automated open source tool for finding potential bugs in Java code. This release includes several new detectors including:

catch blocks that may inadvertently catch runtime exceptions
objects that are instantiated based on classes that only have static methods and fields
calls to Thread.interrupted() in a non static context
calls to Object.equals()
Applets that call methods in the constructor referring to the AppletStub
some cases of infinite recursion
dead stores to local variables
Calling toUpperCase() and toLowerCase() without specifying a locale
new Foo().getClass()
new Thread()

Other new features includes the abaility to filter warnings by bug category, Java 1.5 annotations that suppress FindBugs warnings, and cut, copy, and paste in the Swing GUI. As usual, I tested this release on the latest XOM code base. None of the new detectors found anything real, but the old detectors did find one local variable I could eliminate along with the method call that initialized it and one unused private method, both of which had snuck into the code since the last time I ran FindBugs. False positives were few, mostly involving places where I was deliberately catching and ignoring an exception. Java 1.4 or later is required. FindBugs is published under the LGPL.

I've posted a new release candidate of XOM that includes the fixes from my FindBugs run, as well as new README and LICENSE files, and an improved Ant build file that only compiles the servlet samples if the servlet classes are found somewhere in the classpath. The API and behavior is unchanged. If nobody spots any major problems in this release (what people have been finding lately have mostly been packaging issues) I'll probably release 1.0 in a few days.