Cell Phones
Pagers
Alarm Watches
etc.
and set your notebook's volume to zero
Internet Addresses
URLs
CGI
Understand basic Java syntax and I/O
Have a user's view of the Internet
No prior network programming experience
Applets may:
send data to the code base
receive data from the code base
Applets may not:
send data to hosts other than the code base
receive data from hosts other than the code base
Hosts
Internet Addresses
Ports
Protocols
Devices connected to the Internet are called hosts
Most hosts are computers, but hosts also include routers, printers, fax machines, soda machines, bat houses, etc.
Every host on the Internet is identified by a unique, four-byte Internet Protocol (IP) address.
This is written in dotted quad format like 199.1.32.90 where each byte is an unsigned integer between 0 and 255.
There are about four billion unique IP addresses, but they aren't very efficiently allocated
IPv6 expands the address space to 2128
Numeric addresses are mapped to names like www.blackstar.com or star.blackstar.com by DNS.
Each site runs domain name server software that translates names to IP addresses and vice versa
DNS is a distributed system
The java.net.InetAddress
class represents an IP address.
It converts numeric addresses to host names and host names to numeric addresses.
It is used by other network classes like Socket
and ServerSocket
to identify hosts
There are no public InetAddress()
constructors. Arbitrary addresses may not be created.
All addresses that are created must be checked with DNS
public static
InetAddress getByName(String host) throws UnknownHostException
For example,
try {
InetAddress utopia = InetAddress.getByName("utopia.poly.edu");
InetAddress duke = InetAddress.getByName("128.238.2.92");
// work with the objects...
}
catch (UnknownHostException e) {
System.err.println(e);
}
public static
InetAddress[] getAllByName(String host) throws UnknownHostException
public static
InetAddress getLocalHost() throws UnknownHostException
public
boolean isMulticastAddress()
public
String getHostName()
public
byte[] getAddress()
public
String getHostAddress()
public
int hashCode()
public
boolean equals(Object o)
public
String toString()
In general a host has only one Internet address
This address is subdivided into 65,536 ports
Ports are logical abstractions that allow one host to communicate simultaneously with many other hosts
Many services run on well-known ports. For example, http tends to run on port 80
A protocol defines how two hosts talk to each other.
The daytime protocol, RFC 867, specifies an ASCII representation for the time that's legible to humans.
The time protocol, RFC 868, specifies a binary representation, for the time that's legible to computers.
There are thousands of protocols, standard and non-standard
Requests For Comment
Document how much of the Internet works
Various status levels from obsolete to required to informational
TCP/IP, telnet, SMTP, MIME, HTTP, and more
IETF is based on "rough consensus and running code"
W3C tries to run ahead of implementation
IETF is an informal organization open to participation by anyone
W3C is a vendor consortium open only to companies
HTTP
HTML
XML
RDF
MathML
SMIL
P3P
A URL, short for "Uniform Resource Locator", is a way to unambiguously identify the location of a resource on the Internet.
http://java.sun.com/
file:///Macintosh%20HD/Java/Docs/JDK%201.1.1%20docs/api/java.net.InetAddress.html#_top_
http://www.macintouch.com:80/newsrecent.shtml
ftp://ftp.info.apple.com/pub/
mailto:elharo@metalab.unc.edu
telnet://utopia.poly.edu
ftp://mp3:mp3@138.247.121.61:21000/c%3a/stuff/mp3/
http://elharo@java.oreilly.com/
http://metalab.unc.edu/nywc/comps.phtml?category=Choral+Works
the protocol, aka scheme
the authority
user info:
user name
password
host name or address
port
the path, a.k.a. file
the ref, a.k.a. section or anchor
the query string
A URL
object
represents a URL.
The URL
class
contains methods to
create new URLs
parse the different parts of a URL
get an input stream from a URL so you can read data from a server
get content from the server as a Java object
Content and protocol handlers separate the data being downloaded from the the protocol used to download it.
The protocol handler negotiates with the server and parses any headers. It gives the content handler only the actual data of the requested resource.
The content handler translates those bytes into a Java
object like an InputStream
or ImageProducer
.
When the virtual machine creates a URL
object, it looks for a protocol handler
that understands the protocol part of the URL such as "http" or
"mailto".
If no such handler is found, the constructor throws a MalformedURLException
.
The exact protocols that Java supports vary from implementation to implementation though http and file are supported pretty much everywhere. Sun's JDK 1.1 understands ten:
file
ftp
gopher
http
mailto
appletresource
doc
netdoc
systemresource
verbatim
There are four (six in 1.2) constructors in the java.net.URL
class.
public URL(String u) throws MalformedURLException
public URL(String protocol, String host, String file) throws MalformedURLException
public URL(String protocol, String host, int port, String file) throws MalformedURLException
public URL(URL context, String url) throws MalformedURLException
public URL(String protocol, String host, int port, String file, URLStreamHandler handler) throws MalformedURLException
public URL(URL context, String url, URLStreamHandler handler) throws MalformedURLException
An absolute URL like http://www.poly.edu/fall97/grad.html#cs
try {
URL u = new URL("http://www.poly.edu/fall97/grad.html#cs");
}
catch (MalformedURLException e) {}
You can also construct the URL by passing its pieces to the constructor, like this:
try {
URL u = new URL("http", "www.poly.edu", "/schedule/fall97/bgrad.html#cs");
// work with the URL
}
catch (MalformedURLException e) {}
try {
URL u = new URL("http",
"www.poly.edu", 8000, "/fall97/grad.html#cs");
// work with the URL
}
catch (MalformedURLException e) {}
Many HTML files contain relative URLs.
Consider the page http://metalab.unc.edu/javafaq/index.html
On this page a link to "books.html" refers to http://metalab.unc.edu/javafaq/books.html.
The fourth constructor creates URLs relative to a given URL. For example,
try {
URL u1 = new URL("http://metalab.unc.edu/index.html");
URL u2 = new URL(u1, "books.html");
}
catch (MalformedURLException e) {}
This is particularly useful when parsing HTML.
The java.net.URL
class has five methods to split a URL into its component parts. These are:
public String getProtocol()
public String getHost()
public int getPort()
public String getFile()
public String getRef()
try {
URL u = new URL("http://www.poly.edu/fall97/grad.html#cs ");
System.out.println("The protocol is " + u.getProtocol());
System.out.println("The host is " + u.getHost());
System.out.println("The port is " + u.getPort());
System.out.println("The file is " + u.getFile());
System.out.println("The anchor is " + u.getRef());
}
catch (MalformedURLException e) { }
Java 1.3 adds three more:
public String getAuthority()
public String getUserInfo()
public String getQuery()
If a port is not explicitly specified in the URL it's set to -1. This means the default port.
If the ref doesn't exist, it's just null, so watch out
for NullPointerException
s.
Better yet, test to see that it's non-null before using it.
If the file is left off completely, e.g. http://java.sun.com, then it's set to "/".
The openStream()
method connects to the server specified in the URL and returns an InputStream
object fed by the data from that
connection.
public final InputStream openStream() throws IOException
Any headers that precede the actual data are stripped off before the stream is opened.
Network connections are less reliable and slower than
files. Buffer with a BufferedReader
or a BufferedInputStream
.
import java.net.*;
import java.io.*;
public class Webcat {
public static void main(String[] args) {
for (int i = 0; i < args.length; i++) {
try {
URL u = new URL(args[i]);
InputStream in = u.openStream();
InputStreamReader isr = new InputStreamReader(in);
BufferedReader br = new BufferedReader(isr);
String theLine;
while ((theLine = br.readLine()) != null) {
System.out.println(theLine);
}
}
catch (IOException e) {
System.err.println(e);
}
}
}
}
What readLine()
does:
Sees a carriage return, waits to see if next character is a line feed before returning
What readLine()
should do:
Sees a carriage return, return, throw away next character if it's a linefeed
import java.net.*;
import java.io.*;
public class Webcat {
public static void main(String[] args) {
for (int i = 0; i < args.length; i++) {
try {
URL u = new URL(args[i]);
InputStream in = new BufferedInputStream(u.openStream());
InputStreamReader isr = new InputStreamReader(in, "8859_1");
char c;
while ((c = isr.read()) != -1) {
System.out.write(c);
}
}
catch (IOException e) {
System.err.println(e);
}
}
}
}
Common Gateway Interface
A lot is written about writing server side CGI. I'm going to show you client side CGI.
We'll need to explore HTTP a little deeper to do this
The browser requests a page
The server sends the page
Data flows primarily from the server to the client.
There are times when the server needs to get data from the client rather than the other way around. The common way to do this is with a form like this one:
<FORM NAME="seek" METHOD="GET">
<pre>Author: <INPUT TYPE="text" NAME="author" size="20" VALUE="" maxlength="2047"></INPUT>
Title: <INPUT TYPE="text" NAME="title" size="20" VALUE="" maxlength="2047"></INPUT>
</pre>
<INPUT TYPE="submit" VALUE="Submit Query" />
</FORM>
The user types data into a form and hits the submit button.
The browser sends the data to the server using the Common Gateway Interface, CGI.
CGI uses the HTTP protocol to transmit the data, either as part of the query string or as separate data following the MIME header.
When the data is sent as a query string included with the file request, this is called CGI GET.
When the data is sent as data attached to the request following the MIME header, this is called CGI POST
Web browsers communicate with web servers through a standard protocol known as HTTP, an acronym for HyperText Transfer Protocol.
This protocol defines :
how a browser requests a file from a web server
how a browser sends additional data along with the request (e.g. the data formats it can accept),
how the server sends data back to the client
response codes
Client opens a socket to port 80 on the server.
Client sends a GET request including the name and path of the file it wants and the version of the HTTP protocol it supports.
The client sends a MIME header.
The client sends a blank line.
The server sends a MIME header
The server sends the data in the file.
The server closes the connection.
GET /javafaq/images/cup.gif
Connection: Keep-Alive
User-Agent: Mozilla/3.01 (Macintosh; I; PPC)
Host: www.oreilly.com:80
Accept: image/gif, image/x-xbitmap, image/jpeg, */*
MIME is an acronym for "Multipurpose Internet Mail Extensions".
an Internet standard defined in RFCs 2045 through 2049
originally intended for use with email messages, but has been been adopted for use in HTTP.
When the browser sends a request to a web server, it also sends a MIME header.
MIME headers contain name-value pairs, essentially a name followed by a colon and a space, followed by a value.
Connection: Keep-Alive
User-Agent: Mozilla/3.01 (Macintosh; I; PPC)
Host: www.digitalthink.com:80
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*
When a web server responds to a web browser it sends a MIME header along with the response that looks something like this:
Server: Netscape-Enterprise/2.01
Date: Sat, 02 Aug 1997 07:52:46 GMT
Accept-ranges: bytes
Last-modified: Tue, 29 Jul 1997 15:06:46 GMT
Content-length: 2810
Content-type: text/html
CGI GET data is sent in URL encoded query strings
a query string is a set of name=value pairs separated by ampersands
Author=Sadie, Julie&Title=Women Composers
separated from rest of URL by a question mark
UTF-8
Alphanumeric ASCII characters (a-z, A-Z, and 0-9) and the $-_.!*'(), punctuation symbols are left unchanged.
The space character is converted into a plus sign (+).
Other characters (e.g. &, =, ^, #, %, ^, {, and so on) are translated into a percent sign followed by the two hexadecimal digits corresponding to their numeric value.
The comma is ASCII character 44 (decimal) or 2C (hex). Therefore if the comma appears as part of a URL it is encoded as %2C.
The query string "Author=Sadie, Julie&Title=Women Composers" is encoded as:
Author=Sadie%2C+Julie&Title=Women+Composers
The java.net.URLEncoder
class contains a single static method which encodes strings in x-www-form-url-encoded
format
URLEncoder.encode(String s)
String qs = "Author=Sadie, Julie&Title=Women Composers";
String eqs = URLEncoder.encode(qs);
System.out.println(eqs);
This prints:
Author%3dSadie%2c+Julie%26Title%3dWomen+Composers
String eqs = "Author=" + URLEncoder.encode("Sadie,
Julie");
eqs += "&";
eqs += "Title=";
eqs += URLEncoder.encode("Women Composers");
This prints the properly encoded query string:
Author=Sadie%2c+Julie&Title=Women+Composers
In Java 1.2 the java.net.URLDecoder
class contains a single static method which decodes strings in x-www-form-url-encoded
format
URLEncoder.decode(String s)
String eqs = "Author=" + URLEncoder.encode("Sadie, Julie");
eqs += "&";
eqs += "Title=";
eqs += URLEncoder.encode("Women Composers");
try {
URL u = new URL("http://www.superbooks.com/search.cgi?" + eqs);
InputStream in = u.openStream();
//...
}
catch (IOException e) { //...
The java.net.URLConnection
class is an abstract class that handles communication with different kinds of
servers like ftp servers and web servers.
Protocol specific subclasses of URLConnection
handle different kinds of servers.
By default, connections to HTTP URLs use the GET method.
Can send output as well as read input
Can post data to CGIs
Can read headers from a connection
The URL is constructed.
The URL's openConnection()
method creates the URLConnection
object.
The parameters for the connection and the request properties that the client sends to the server are set up.
The connect()
method
makes the connection to the server. (optional)
The response header information is read using
getHeaderField()
.
The data is read from an
InputStream
Data may be read from the connection in one of two ways
raw by using the input stream returned by getInputStream()
through a content handler with getContent()
.
Data can be sent to the server using the output stream
provided by getOutputStream()
.
try {
URL u = new URL("http://www.sdexpo.com/");
URLConnection uc = u.openConnection();
uc.connect();
InputStream in = uc.getInputStream();
// read the data...
}
catch (IOException e) { //...
The getHeaderField(String
name)
method returns the string value of a named header field.
Names are case-insensitive.
If the requested field is not present, null
is returned.
String lm = uc.getHeaderField("Last-modified");
The keys of the header fields are returned by the getHeaderFieldKey(int n)
method.
The first field is 1.
If a numbered key is not found, null
is returned.
You can use this in combination with getHeaderField()
to loop through the
complete header
String key = null;
for (int i=1; (key = uc.getHeaderFieldKey(i))!=null); i++) {
System.out.println(key + ": " + uc.getHeaderField(key));
}
Utility methods that read a named header and convert
its value into an int
and a long
respectively.
public int getHeaderFieldInt(String name, int default)
public long getHeaderFieldDate(String name, long default)
The long
returned by getHeaderFieldDate()
can be converted into a Date
object
using a Date()
constructor like this:
String s = uc.getHeaderFieldDate("Last-modified", 0);
Date lm = new Date(s);
These return the values of six particularly common header fields:
public int getContentLength()
public String getContentType()
public String getContentEncoding()
public long getExpiration()
public long getDate()
public long getLastModified()
try {
URL u = new URL("http://www.sdexpo.com/");
URLConnection uc = u.openConnection();
uc.connect();
String key=null;
for (int n = 1;
key=uc.getHeaderFieldKey(n)) != null;
n++) {
System.out.println(key + ": " + uc.getHeaderField(key));
}
}
catch (IOException e) {
System.err.println(e);
}
Similar to reading data from a URLConnection
.
First inform the URLConnection
that you plan to use it
for output
Before getting the connection's input stream, get the connection's output stream and write to it.
Commonly used to talk to CGIs that use the POST method
Construct the URL.
Call the URL's openConnection()
method to create the
URLConnection
object.
Pass true
to the URLConnection
's setDoOutput()
method
Create the data you want to send, preferably as a byte array.
Call getOutputStream()
to get an output stream object.
Write the byte array calculated in step 5 onto the stream.
Close the output stream.
Call getInputStream()
to get an input stream object. Read from it as usual.
A typical POST request to a CGI looks like this:
POST /cgi-bin/booksearch.pl HTTP/1.0
Referer: http://www.macfaq.com/sampleform.html
User-Agent: Mozilla/3.01 (Macintosh; I; PPC)
Content-length: 60
Content-type: text/x-www-form-urlencoded
Host: utopia.poly.edu:56435
username=Sadie%2C+Julie&realname=Women+Composers
the POST line
a MIME header which must include
content type
content length
a blank line that signals the end of the MIME header
the actual data of the form, encoded in x-www-form-urlencoded format.
A URLConnection
for an http URL will set up the request
line and the MIME header for you as long as you
set its doOutput
field to true by
invoking setDoOutput(true)
.
If you also want to read from the connection, you
should set doInput
to true with setDoInput(true)
too.
URLConnection uc = u.openConnection();
uc.setDoOutput(true);
uc.setDoInput(true);
The request
line and MIME header are sent as soon as the URLConnection
connects.
Then getOutputStream()
returns an output stream
on which you can write the x-www-form-urlencoded name-value pairs.
java.net.HttpURLConnection
is an abstract subclass of URLConnection
that provides some additional methods specific to the HTTP protocol.
URL connection objects that are returned by an http URL
will be instances of java.net.HttpURLConnection
.
A typical HTTP response from a web server begins like this:
HTTP/1.0 200 OK
Server: Netscape-Enterprise/2.01
Date: Sat, 02 Aug 1997 07:52:46 GMT
Accept-ranges: bytes
Last-modified: Tue, 29 Jul 1997 15:06:46 GMT
Content-length: 2810
Content-type: text/html
The getHeaderField()
and getHeaderFieldKey()
don't return
the HTTP response code
After you've connected, you can retrieve the numeric
response code--200 in the above example--with the getResponseCode()
method and the message associated with it--OK
in the above example--with the getResponseMessage()
method.
Java 1.0 only supports GET and POST requests to HTTP servers
Java 1.1/1.2 supports GET, POST, HEAD, OPTIONS, PUT, DELETE, and TRACE.
The protocol is chosen with the setRequestMethod(String method)
method.
A java.net.ProtocolException
, a subclass of IOException
, is thrown if an unknown
protocol is specified.
The getRequestMethod()
method returns the string form of the request method currently set for the
URLConnection. GET is the default method.
The disconnect()
method of the HttpURLConnection
class
closes the connection to the web server.
Needed for HTTP/1.1 Keep-alive
try {
URL u = new URL("http://www.amnesty.org/");
HttpURLConnection huc = (HttpURLConnection) u.openConnection();
huc.setRequestMethod("PUT");
huc.connect();
OutputStream os = huc.getOutputStream();
int code = huc.getResponseCode();
if (code >= 200 && port < 300) {
// put the data...
}
huc.disconnect();
}
catch (IOException e) { //...
The boolean usingProxy()
method returns true if web connections are being funneled through a proxy
server, false if they're not.
Most web servers can be configured to automatically redirect browsers to the new location of a page that's moved.
To redirect browsers, a server sends a 300 level response and a Location header that specifies the new location of the requested page.
GET /~elharo/macfaq/index.html HTTP/1.0
HTTP/1.1 302 Moved Temporarily
Date: Mon, 04 Aug 1997 14:21:27 GMT
Server: Apache/1.2b7
Location: http://www.macfaq.com/macfaq/index.html
Connection: close
Content-type: text/html
<HTML><HEAD>
<TITLE>302 Moved Temporarily</TITLE>
</HEAD><BODY>
<H1>Moved Temporarily</H1>
The document has moved
<A HREF="http://www.macfaq.com/macfaq/index.html">here</A>.<P>
</BODY></HTML>
HTML is returned for browsers that don't understand redirects, but most modern browsers jump straight to the page specified in the Location header instead.
Because redirects can change the site which a user is connecting without their knowledge so redirects are not arbitrarily followed by URLConnections.
HttpURLConnection.setFollowRedirects(true)
method says that connections will follow redirect instructions from the web
server.
Untrusted applets are not allowed to set this.
HttpURLConnection.getFollowRedirects()
returns true if redirect requests are honored, false if they're not.
Java Network Programming, Second Edition
Elliotte Rusty Harold
O'Reilly & Associates, 2000
ISBN: 1-565-92870-9
Java I/O
O'Reilly & Associates, 1999
ISBN 1-56592-485-1
Web Client Programming online course
This presentation: http://www.ibiblio.org/javafaq/slides/sd2000east/urls/