Networks, Secrets, and Beans

Welcome to the fourth issue of Cafe Au Lait. In this issue I talk about the projects that have kept me busy since the last issue of this newsletter, way back in November, and explain why mirroring web sites is a fundamentally bad idea.

Letters to Cafe Au Lait

This section is for reader comments and questions. There weren't any significant letters this time. Let's see if we can get some dialog going, so if you'd like to send a "letter to the editor" for the next issue, email it to cal@listserv.oit.unc.edu. If you're writing to me about Cafe Au Lait, and don't want your letter published, please tell me. You can also let me know if you'd rather be anonymous. However, if you'd like your email address published with your letter, please tell me. I will not publish your email address without a specific request. As usual, all letters may be edited for clarity and brevity; and I reserve the right to decide whether to print a particular letter.

What's been going on in my life

It's been more than half a year since the last issue of this newsletter. The fact is that my time has been extremely limited, and this publication has fallen by the way side as I attempted to keep my various web sites up to date as well as meeting various book deadlines. I'm debating some different formats for this newsletter in the future that may allow it to come out on a slightly more regular basis.

I have been updating the Cafe Au Lait web site on an almost daily basis, (though the training and trade show pages are falling a little behind); and it's become a popular source of the latest Java information. One thought that's crossed my mind is changing this newsletter from the current occasional, long and rambling missive to a quick and dirty daily listing covering just the news of the day, such as that currently posted in the news section of Cafe Au Lait. This wouldn't mean I'd stop posting news on the web site, just that you'd have an option to have it pushed into your mailbox instead of checking the site. If so, the newsletter would become slightly less formal than it is now, more like getting an email from me every day (or at least every weekday.) If this sounds like a particularly good or bad idea to you, let me know. If I do decide to move ahead with this plan, I'll be sure to give everyone plenty of warning so those who don't want this daily will have time to unsubscribe.

Although the Cafe Au Lait newsletter has been less than punctual, a lot of other projects have moved along. Since you last heard from me I've published two more books and have several more under development. I've started lecturing in the computer science department of Polytechnic University, and I'm developing online courses for Digital Think. I've also begun putting some of my less computer-oriented writing on the web.

Introduction to Java Programming

I've introduced a course titled "Introduction to Java Programming" at Polytechnic University in Brooklyn. That course started with the material in the Java Developer's Resource, (JDR) but moved it up to Java 1.1 and added several topics including network programming. This course is taught at the graduate level, and like the JDR it assumes substantial prior experience with programming. However, there were a number of juniors and seniors in the course as well, and for the most part they didn't have any problems as a group.

Notes from the course are now available online at http://sunsite.unc.edu/javafaq/course/ and make an excellent supplement to the JDR for those of you moving to Java 1.1. I'm scheduled to teach the course again in the second summer session and in the Fall so I'll be updating those notes regularly. Tar files of the complete set of notes are available to any teacher adopting the JDR as a text for a Java class. (Please note: A tar file is not available to individuals for self-study. These notes are intended for use in a class with an instructor and are not meant as a stand-alone tutorial. I recommend individuals purchase The Java Developer's Resource or some other good book.)

Java Network Programming

O'Reilly published my second book, Java Network Programming in February. Most of what is new and exciting about Java centers around new kinds of dynamic, networked applications, and Java Network Programming shows you how to write them. This book combines a general introduction to application layer network programming with complete coverage of Java's networking classes. No prior experience with network programming is assumed. Among other topics you'll learn about Sockets, Server Sockets, UDP and TCP traffic, Internet addresses, URLs, HTTP, HTML, and how all of these are handled in Java.

Manning Publications has also recently published a book titled Java Network Programming. It's not a bad book, and is surprisingly orthogonal to mine. About 2/3 of that book is streams and encryption which I only touch on. My book covers servlets, applets, multicast sockets, and Java 1.1 which that book doesn't discuss in any depth. The matching titles appear to be just unlucky choices. Both publishers went with the most obvious title they could think of. However the cover of the Manning book has a big fish, and looks suspiciously like an O'Reilly book. Don't be fooled. The real O'Reilly book has a gyroscope on the cover.

I've put the examples from Java Network Programming online at Cafe Au Lait at ftp://sunsite.unc.edu/pub/multimedia/.languages/java/javafaq/javanetexamples.tar.gz. O'Reilly also has the table of contents and index on their web site. The online examples correct several mistakes in the book, and revise the examples from Chapter 15, the Java Server API, so they work with the beta versions of the Java Web Server. I haven't yet tested them with the release version of the Java Servlet Development Kit (JSDK), but if you have any troubles let me know and I'll check them out.

Java Network Programming is $34.95, ISBN number 1-56592-227-1, and is available now from any bookstore that stocks computer books including amazon.com and cbooks.

Java Secrets

My latest book, Java Secrets, w as released this week by IDG Books, and is available now from amazon.com and soon from cbooks, and any other book store that stocks computer books.

Why I Wrote Java Secrets

There are more than a hundred books about Java on bookstore shelves today, and at least ninety of them are completely predictable and more or less interchangeable. It's as if they had all been written from the same outline but by different authors.

Each book begins with a chapter about what's special about Java and how it differs from other programming languages. Each book shows how to write Hello World and other command line applications to teach Java's syntax. There is a chapter or two on object-oriented programming, a chapter on threads, a chapter on exceptions, and a few chapters on the AWT. I know. I wrote one of these books.

Java Secrets is different. It starts where the other books stop. This book assumes you already know Java's syntax and what an object is. This book assumes you're comfortable with the AWT. Instead of rehashing these topics, this book delves into the parts of Java that are not documented by Sun, that are not generally accessible to anyone with a web browser, and that are not already in a hundred other books.

I had some reservations about writing this book. I still do. This is a dangerous book. It reveals knowledge that can easily be abused. Improper use of the secrets revealed herein can easily tie Java programs to specific platforms or implementations. As a longtime Mac user I know the agony of watching all the best software come out on Windows first and the Mac much later, if at all. I do not want to extend this trend to Java-based software.

Nonetheless I have come to the conclusion that a book like this is necessary if Java is to move out of its niche of creating applets for web pages and into the broader software development market. There are many applications for which Java is ideal, but which cannot be written without more information than Sun has chosen to reveal. Among other things these include stand-alone executable applications. HotJava and javac are such applications so it must be possible to write them, but until now Sun has not revealed how. This book reveals that secret among others.

However, rationalize though I might (and I'm quite good at rationalizing, I admit), the real reason this book is being written is that it seemed like a neat thing to do at the time. This is far and away the most exciting book I've ever written. The sheer number of "Aha!" experiences I've had while researching and writing it is phenomenal. I hope you'll get the same feeling while reading it. Nonetheless, I know the information I present here will be misused. I accept that. Nonetheless I firmly believe that in the long run more knowledge is a good thing, dangerous though it may be; and that secrets are meant to be revealed.

What's In This Book?

There are three different ways a Java program can become dangerous. It can rely on the internal structure of Java objects; it can use classes it isn't supposed to know about; or it can be platform specific. This book covers all three.

After a brief introduction, Part One begins with six chapters on Java internals. The reader will learn how objects and primitive data types are laid out in memory, how arguments are passed to and values returned from methods, what a variable really is, and more. Java's implementation of arrays and Strings will be explored. Different possible thread model and garbage collection algorithms are discussed and compared, shedding some light on why Java uses the data structures and algorithms it does and why it sometimes behaves in unexpected ways. This is all tied to the Java .class file format in two chapters that teach the reader how to read and disassemble Java byte code. Finally you'll learn how an applet runs and what really happens when a web browser loads an applet.

Part Two delves into the sun classes, a group of undocumented packages that add considerable power to Java programs. The following are just a few of the undocumented classes that will be covered in this section:

More Layout Managers
Communicating with ftp, mail and news servers
Data encoding and decoding
Character set conversion
Protocol and content handlers

As you can see Sun has hidden a lot of functionality inside the sun classes. This book reveals it.

Part Three explores the possibilities opened by platform dependent code. It demonstrates how to call the native API and how to create stand-alone executable programs.

Finally the CD includes the source code from the book, the JDK 1.1 for Solaris and Windows, and a useful assortment of Java hacking tools including a full version of the payware Java decompiler WingDIS, version 2.0.3.

A Few Caveats

This is not an introductory book. It is for the programmer who has learned enough about Java to be frustrated by its limitations. You should have a solid grasp of the fundamentals of both the Java language and the AWT, including advanced topics like threads. Although every effort has been made to make this book accessible to as broad a range of readers as possible, this is not an introductory book and does require more of its reader than most books on the market.

On the other hand this book does not assume prior experience with assembly language, Java byte code, compiler design, or even pointers. In fact this book may serve as a first taste of some of these to a reader who's never seen them before, in Java or any other language. Nonetheless low-level programmers who are familiar with pointers, assembly language and compiler design should nonetheless find the discussion of Java's implementation of these topics to be useful. They'll simply find the book easier going than a programmer encountering these topics for the first time.

Bugs

This book is so far out on the bleeding edge, I've got a personal account rep at the New York Blood Bank. I've done my best to try to provide useful and accurate information. All the code in his book has been verified on at least one virtual machine (VM). Most of the code has been tested on two or more. However, because Java runs on so many different platforms and because it is changing in Internet time, it is impossible to be completely precise and accurate in all instances. Furthermore, precisely because the material in this book is secret, it's been extremely hard to verify.

I am certain there are mistakes here. In fact, I'm sure there are some real whoppers. (The first one's on the back cover. Somehow the system requirements were mistakenly listed as a "PC with Office 97 running Windows 95". That's totally untrue. Any Java 1.0 or later platform should be OK, though Java 1.1 would be extremely helpful. I mostly wrote this book on a Macintosh PowerBook and a Solaris SparcStation. Most of the screen shots were captured on the Mac.) Please, please use this information carefully and read it with a critical eye. If you do find mistakes or inaccuracies, let me know by sending email to elharo@sunsite.unc.edu, and I'll correct them in future editions. I will also post corrections and updates on my web site at http://sunsite.unc.edu/javafaq/secrets/ so you may wish to look there first before sending me email.

Java Secrets lists for $59.99 but amazon.com has it in stock for 20% off, that is $48.00. The ISBN number is 0-76458-007-8, and it should be available soon from any bookstore that stocks computer books.

Java Streams

I've been developing a course covering most of the java.io package in Java 1.1 for Digital Think, a purveyor of online courses. Input and Output with Java Streams teaches Java programmers how to read and write data to files, System.out, and System.in using streams, readers, and writers The course is divided into four modules:

Stream Basics
Advanced Streams
Files and File Dialogs
Readers and Writers

After completing this course, you'll be able to:

Read and write data in a variety of formats
Create, move, delete, choose, read, and write files
Allow users to enter data from the command line

The course is mostly complete, and we'll be beta testing it between July 22 and August 10, 1997. We're currently recruiting beta testers who'd be interested in taking this course, and providing feedback. Testers will need approximately 20 hours over that time period to take the course (one of the things we're testing is exactly how long the course takes) and will be asked to provide ongoing feedback (what's working, what needs more explanation, etc.) as well as brief weekly progress reports.

Ideal beta testers should be comfortable with basic Java syntax including arrays, strings, and primitive data types. In particular they should understand how Java deals with ints, bytes, and arrays of those types. Course students should also be comfortable with classes, objects, and methods, particularly constructors and toString() methods. Students should be able to write character mode Java applications with main() methods. Finally, they should know the basics of the AWT including windows, dialogs, and Java 1.1 event handling. The Java Development Kit (JDK) 1.1 or later or a comparable development platform is required.

If you'd like to be a beta tester and take the course for free in exchange for your feedback, please contact Jim Tushinski at jimt@digitalthink.com

Online Journal

Although I make my living as a writer, I'm not yet to the level of an Isaac Asimov or a Norman Mailer where I can write about anything that crosses my mind and expect somebody to print it. In fact I'm still finding it difficult to get anything not related to Java published (Quantum Mechanics for Dummies, anyone?) so I've started putting some of my more random musings on the web in Rusty's Online Journal. Topics currently addressed include:

None of this has anything to do with Java, but if like me your interests extend beyond the narrow realms of the Internet, you may find some of this amusing.

Corrections

In Chapter 5 of the JDR, Booleans and Flow Control, I really messed up the details of the switch statement. (pp. 133-134) The labels of a case statement can only be literals or final static int fields. They cannot be variables or expressions as I incorrectly claimed in the book. This has to do with how Java compiles a switch statement in the virtual machine. The actual numeric values of the case statements are embedded in the byte code. This makes switch statements much more efficient than they otherwise would be. (Interested readers can find a few more details in Java Secrets or The Java Virtual Machine Specification. ) Here's how it should read:

Java has a shorthand for these types of multiple if statements, the switch-case statement. Here's how you'd write the above using a switch statement:
switch (x) {
  case 0: 
    // do thing 0...;
    break;
  case 1: 
    // do thing 1...;
    break;
  case 2: 
    // do thing 2...;
    break;
  case 3: 
    // do thing 3...;
    break;
  default: 
   // do thing 4...;
}
In this fragment x must be a variable or expression that can be cast to an int without loss of precision. This means the variable must be or the expression must return an int, byte, short or char. x is compared with the value of each the case statements in succession. This fragment compares x to literals, but these too could be variables or expressions as long as the variable or result of the expression is an int, byte, short or char.
Once a case statement is matched all executable statements following it are executed including those in subsequent, unmatched case statements. This can trigger decidedly unexpected behavior. Therefore it's common practice to include the break statement at the end of each case block. If the breaks weren't included in the above code fragment and case 1 were matched, then not only thing 1 but also thing 2, thing 3, and thing 4 would be performed. It's important to remember that the switch statement doesn't end when one case is matched and its action performed. The program continues to look for additional matches unless specifically told to break.
Finally if no cases are matched, the default action is triggered.

It is not true that multiple case statements are matched as I claimed in the book. Thanks are due to Bob Follek for catching these mistakes.

Why I Hate Mirrors

My Java FAQ list and tutorial were some of the first sources of Java information on the net, almost two years ago. As such they were spread far and wide, with little control on my part.

Every couple of months I go on a mirror hunt to try to find and eliminate illegal copies. Generally I go to altavista or some other search engine and search for a phrase that's unlikely to be found in any document that isn't the one I'm looking for. Then I painstakingly follow each of the links, figure out who's likely responsible for the site, and send them a polite email asking them to remove the mirror copy and replace it with a link to the official site instead.

Most of the time the webmaster is apologetic (in many cases: they've even forgotten the file exists on their site, and they're often surprised that I found it. Few people realize the comprehensiveness of a good search engine.) On rare occasions I have to send a second, not so nice email explaining that they're violating the law and insisting that they remove the files. On even rarer occasions I actually have to get my attorney to send a cease and desist letter, or even commence legal action. (There's one obstinate site in France that doesn't believe I'll actually follow through on international legal action. They're about to find out just how seriously I take this.)

This is a huge hassle, and I really wish I didn't have to do it. Given that I mostly just give this information away, and don't even sell advertising on my site, some have asked why I go to so much trouble. There are a number of reasons. Most importantly:

Mirror sites are out of date and inaccurate.
Mirrors make it harder to sell content.
Mirrors don't mirror enough.
The real solution is bandwidth.

Let's investigate these issues in turn.

Mirror sites are out of date and inaccurate.

The worst problem with mirror sites is that they're out of date. Most people copy the file once or even twice, then forget about it. There are still copies of my Java tutorial on the net that cover Java 1.0 alpha! I get email from people asking me about mistakes in my work that I fixed months, even years ago! In some cases it's not even clear to me what a correspondent is talking about because I don't realize they're using a copy from an out-dated mirror site.

One of the unique features of Cafe Au Lait is that I often update the news at the site several times a day. However, the official mirrors only copy the files once a day, at most. In some cases they've gone months without an update. The only way I can hope to keep my site reasonable current and accurate is if it exists at one centralized location.

Mirrors make it harder to sell content.

I know at least six thousand unique people a day visit my site, but the real number could be much higher. I get no stats from the various mirror sites, both legal and illegal. There are other problems here, such as proxy servers like AOL's; but mirroring is likely the largest. Although I don't sell advertising on my site, hit counts at my sites help me market myself for all sorts of jobs ranging from author to webmaster. The higher and the more accurate the hit figures I can provide for Cafe Au Lait, the more likely I am to be able to convince a magazine that they should give me a column, or to convince a commercial site that they need to hire me to revamp their interface.

This affects the site and its readers too. The more people who use Cafe Au Lait, the more companies make sure that I get the necessary information about their products, and the more I can pass that along to the visitors to the site.

Mirrors don`t mirror enough.

Many of my pages contain internal links to images and other pages on the same site. When just a few files are copied rather than the entire site, these links break; and guess who gets to deal with the email reporting the problems? (I'll give you a hint. It's not the person who illegally appropriate the pages.)

Modern web sites are quite complex entities, and merely saving a page out of a browser and the ftping it onto your web server is not enough. Doing a real mirror is a complex and error-prone operation that requires human intervention and care. It requires, among other things, making sure that not only the files themselves but their relation to the rest of the web site are copied. It also requires that many server config files be identical. This is an especially large problem for sites that mirror multiple other sites, since two sites they mirror may require different server configurations. For example, one may expect that directory indexing is turned on while another expects that it is turned off.

There are several legal mirrors of Cafe Au Lait at the other sunsites. I'm not fond of them, and I recommend people use the main site at UNC instead, but I accept them as a band-aid solution given the current speed of international links. Nonetheless many of them have problems with CGI scripts, with internal links that work at sunsite but fail at other sites because they've moved the root of the site document tree, and with a few other things.

The real solution is bandwidth.

The web is not meant to be a replacement for FTP, email, or CD-ROMs. The web is designed to make it extremely easy for the user to go to the content rather than the content to go to the user. Anything that attempts work around this fundamental underpinning of the web (including some much-hyped push technologies) is doomed to failure.

Increased bandwidth is the real solution to the problems that mirroring attempts to solve. This needs to happen in a number of places, all of which currently present bottlenecks. ISDN, cable modems, ASDL, multiplexed POTS lines, or equivalent technologies have to be brought to the end user. Modems are no longer enough. New international cables need to be laid. Intranational backbone capacity must be expanded too, and peering should be enforced on the corporate greedheads at UUNet and elsewhere who built businesses on taxpayer dollars, and now want to pull up the ladder behind them. The government should build the next generation Internet for scientists and education, so researchers can once again pass important scientific data back and forth without getting stuck behind ads for Skinny Dip Thigh Cream. IP address blocks need to be aggregated into larger groups to simplify routing tables, and the routers themselves need to be replaced with faster models.

With the exception of enforced peering, all of this is happening at varying rates of speed. One of the interesting effects of the phenomenal growth of the Internet is that multiple bottlenecks have developed simultaneously. Nonetheless, this is a temporary problem. The bandwidth is coming, and the growth is starting to level off, at least in the United States. The rest of the world is following the same curve to the same ultimately stable state, just starting from different initial conditions. Mirroring is at best a solution to a temporary problem, but one which unless carefully controlled presents problems of its own that will exist well past the time when its usefulness has expired.

I've described the problems (and some of the solutions) based on my experience with Cafe Au Lait and a few other sites. However these problems are hardly unique to me. Some of my colleagues have encountered even worse, for example people who not only copy their sites but also strip out their names and replace it with the thief's. Anyone who's this big of a scumbag has given up any presumption of innocent infringement, and deserves to get hit with a copyright infringement lawsuit for statutory and compensatory damages.

I hope I've convinced you that mirroring runs at cross-purposes with the web, and perhaps explained why I try to be so careful about which sites I allow to mirror Cafe Au Lait. The right solutions to the problem of low-bandwidth aren't here yet, but they are coming; and when they get here, I'd rather not have to spend years cleaning up broken mirrors.

Subscription Instructions

To subscribe to this newsletter send email to listproc@listserv.oit.unc.edu from the account you want to receive mail from with the following text only in the BODY of your message (NOT the subject.)

SUBSCRIBE cal FirstName LastName

You should of course replace FirstName and LastName with your real first name and last name though I won't be particularly bothered if you wish to use an alias.

To unsubscribe from the list send email to listproc@listserv.oit.unc.edu from the account you wish to unsubscribe with the following line in the BODY of your message (NOT the subject.)

unsubscribe cal

To get more information on how to use this service, please send the command HELP in a line by itself in a mail message to listproc@listserv.oit.unc.edu.

You MUST follow these instructions. The list owner will not subscribe or unsubscribe you manually if you do not follow these instructions. Requests to do so will be ignored. Repeated requests will get you dumped in my kill file.

Back Issues