Java News from Wednesday, January 28, 2004

Today I've been working on some example code covering JSSE, specifically the SSLSocket class. I started with a simple example program I published in Java Network Programming a few years ago to connect to an HTTPS server. The program didn't work at first due to some expired certificates but that was easily fixed by upgrading to Java 1.4.2_03. Then things got weird.

Once the program would agree to talk to the server, it began returning data that looked like this:

HTTP/1.1 200 OK
Server: Netscape-Enterprise/6.0
Date: Wed, 28 Jan 2004 18:13:08 GMT
Content-type: text/html
nnCoection: close
31-Dec-2010 00:00:00 GMT; path=/
Transfer-Encoding: chunked

b6b
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=UTF-8">

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML>...
</HTML>
0

Beyond this weird data, the server did not close the connection, nor did it provide a Content-length header to allow the client to figure out when to close the connection as RTFC 2818 requires. It took me quite a while to convince myself nothing in my code was mucking up the data. However, once In was fairly sure the data Java was producing was indeed what the server was sending, the next question was why was the server sending this?

The first weird part was the "b6b" before the HTML document. At first I thought this was an encoding issue (e.g. I was trying to read UTF-8 data as ASCII, or something like that) but I ruled that out by getting the same characters when I read everything raw without decoding it. I tried a couple of different HTTPS servers and noticed that they behaved similarly. However, instead of b6b the second server sent the garbage data "ca". Eventually it occurred to me that these might be hexadecimal digits. Since there wasn't any Content-length header from either server, I surmised this might be a Content-length. I rewrote my code to read the first line of the HTTP body as a hexadecimal content-length, and lo and behold it worked! The numbers matched the actual body length and I was able to close the connection at the right place.

Next I tried adding "Connection: close" to the client headers my program sent to the server. One of the servers I was testing stopped sending a hexadecimmal content length in the response body, but the other didn't. However, both now sent a Connection: close response in the server MIME header.

However, two questions remain:

  1. Where is this scheme for specifying the Content-length with hexadecimal data at the beginning of the HTTP body documented? I certainly didn't find it in the HTTP 1.1 or HTTPS specifications. Is this even legal? Where did it come from?
  2. Why are servers deliberately misspelling "Connection: close"? Googling around revealed that a couple of other users had noticed this, but nobody seemed to have an answer.

If it helps, one server was https:/www.amazon.com/. One was https://www.usps.com/. However not all servers behave this way. https://www.verisign.com/, for example, does not send a Content-length in the response body and closes the conenction in the normal way from the server side. Any ideas?

OK. I've got the answer to question 1. It seems the content encoding is chunked. My mistake was thinking this must have something to do with HTTPS. It doesn't. The same issue could arise with plain vanilla HTTP. Still unanswered: why is amazon misspelling "Connection: close"?


R. Rawson-Tetley has posted SwingWT 0.78, an open source, "100% pure Java library which very closely resembles the interface of Swing. The difference is that instead of using the Swing library, it drives native peer widgets from SWT" (the Eclipse GUI toolkit). With this library, Java/Swing applications can be compiled natively under Linux using gcj. It also allows Swing apps to use native widgets. This version is much more compatible with existing Swing apps. SwingWT is dual licensed under the Common Public License and the LGPL.


Hugues Pisapia and Marc Gimpel have posted JSpeex 0.9.2, an open source "Java port of the Speex speech codec (Open Source/Free Software patent-free audio compression format designed for speech). It provides both the decoder and the encoder in pure Java, as well as a JavaSound SPI." 0.9.2 adds support for Speex wave file decoding and fixes some bugs. JSpeex is published under a BSD license.