Java News from Saturday, June 2, 2007

The Apache JAMES team has posted Apache Mime4J 0.3, an open source Java library for parsing e-mail messages.

mime4j provides a parser, MimeStreamParser , for e-mail message streams in plain rfc822 and MIME format. The parser uses a callback mechanism to report parsing events such as the start of an entity header, the start of a body, etc. If you are familiar with the SAX XML parser interface you should have no problem getting started with mime4j.

The parser only deals with the structure of the message stream. It won't do any decoding of base64 or quoted-printable encoded header fields and bodies. This is intentional - the parser should only provide the most basic functionality needed to build more complex parsers. However, mime4j does include facilities to decode bodies and fields and the Message class described below handles decoding of fields and bodies transparently.

The parser has been designed to be extremely tolerant against messages violating the standards. It has been tested using a large corpus (>5000) of e-mail messages. As a benchmark the widely used perl MIME::Tools parser has been used. mime4j and MIME:Tools rarely differ (<25 in those 5000). When they do (which only occurs for illegally formatted spam messages) we think mime4j does a better job.

mime4j can also be used to build a tree representation of an e-mail message using the Message class. Using this facility mime4j automatically handles the decoding of fields and bodies and uses temporary files for large attachments. This representation is similar to the representation constructed by the JavaMail API:s but is more tolerant to messages violating the standards.