Displaying HTML
Parsing HTML
Element names may be upper case, lower case, or mixed case.
Attribute values may or may not be quoted.
If they are quoted, either single or double quotes may be used.
The <
sign may be escaped as <
or it may just be left in the file raw.
The <P>
tag may be used to begin or end a paragraph.
Closing </P>
</LI>
and </TD>
tags may or may not be used.
Tags may or may not overlap.
HTML can be used to label components.
The javax.swing.text.html
package allows you to display
basic HTML in your JFC based applications.
The javax.swing.text.html.parser
package can be used to read HTML documents, in more or less their
full non-standard atrocity.
JButton jb = new JButton("<html><b><i>Hello World!</i></b></html>");
Upper case HTML doesn't work:
JButton jb = new JButton("<HTML><B><I>Hello World!</I></B></HTML>");
On the other hand, Sun has no qualms with malformed HTML that omits the end tags like this:
JButton jb = new JButton("<html><b><i>Hello World!");
import java.applet.*;
import javax.swing.*;
public class HTMLLabelApplet extends JApplet {
public void init() {
JLabel theText = new JLabel(
"<html>Hello! This is a multiline label with <b>bold</b> "
+ "and <i>italic</i> text. <P> "
+ "It can use paragraphs, horizontal lines, <hr> "
+ "<font color=red>colors</font> "
+ "and most of the other basic features of HTML 3.2</html>");
this.getContentPane().add(theText);
}
}
Almost all HTML 3.2 tags are supported, at least partially including
IMG
and the various table tags.
Only completely unsupported HTML 3.2 tags are:
<APPLET>
<PARAM>
<MAP>
<AREA>
<LINK>
<SCRIPT>
<STYLE>
Frames aren't suported
Realative URLs aren't resolved
javax.swing.JEditorPane
A complete HTML 3.2 renderer that can handle frames, forms, hyperlinks, and parts of CSS Level 1
Also supports plain text and RTF
public JEditorPane()
public JEditorPane(URL initialPage) throws IOException
public JEditorPane(String url) throws IOException
public JEditorPane(String mimeType, String text)
public void setPage(URL page) throws IOException
public void setPage(String url) throws IOException
public void setText(String text)
import javax.swing.text.*;
import javax.swing.*;
import java.io.*;
import java.awt.*;
public class OReillyHomePage {
public static void main(String[] args) {
JEditorPane jep = new JEditorPane();
jep.setEditable(false);
try {
jep.setPage("http://www.oreilly.com");
}
catch (IOException e) {
jep.setContentType("text/html");
jep.setText("<html>Could not load http://www.oreilly.com </html>");
}
JScrollPane scrollPane = new JScrollPane(jep);
JFrame f = new JFrame("O'Reilly & Associates");
// Next line requires Java 1.3
f.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
f.getContentPane().add(scrollPane);
f.setSize(512, 342);
f.show();
}
}
Link action
JavaScript
Flash
XML
QuickTime
etc.
public JEditorPane(URL u)
JFrame f = new JFrame("O'Reilly & Associates");
f.setDefaultCloseOperation(WindowConstants.DISPOSE_ON_CLOSE);
try {
URL u = new URL("http://www.oreilly.com");
JEditorPane jep = new JEditorPane(u);
jep.setEditable(false);
JScrollPane scrollPane = new JScrollPane(jep);
f.getContentPane().add(scrollPane);
}
catch (IOException e) {
f.getContentPane().add(
new Label("Could not load http://www.oreilly.com"));
}
f.setSize(512, 342);
f.show();
public JEditorPane(String url)
try {
JEditorPane jep = new JEditorPane("http://www.oreilly.com");
jep.setEditable(false);
JScrollPane scrollPane = new JScrollPane(jep);
f.getContentPane().add(scrollPane);
}
catch (IOException e) {
f.getContentPane().add(
new Label("Could not load http://www.oreilly.com"));
}
public JEditorPane(String mimeType, String text)
First argument is MIME type
Second argument is actual HTML as a String
JEditorPane jep = new JEditorPane("text/html",
"<html><h1>Hello World!</h1> <h2>Goodbye World!</h2></html>");
import javax.swing.text.*;
import javax.swing.*;
import java.io.*;
import java.awt.*;
public class Fibonacci {
public static void main(String[] args) {
StringBuffer result =
new StringBuffer("<html><body><h1>Fibonacci Sequence</h1><ol>");
long low = 0;
long high = 1;
for (int i = 0; i < 50; i++) {
result.append("<li>");
result.append(low);
long temp = high;
high = low + high;
low = temp;
}
result.append("</ol></body></html>");
JEditorPane jep = new JEditorPane("text/html", result.toString());
jep.setEditable(false);
JScrollPane scrollPane = new JScrollPane(jep);
JFrame f = new JFrame("Fibonacci Sequence");
f.setDefaultCloseOperation(WindowConstants.DISPOSE_ON_CLOSE);
f.getContentPane().add(scrollPane);
f.setSize(512, 342);
f.show();
}
}
When the user clicks on a link in a non-editable JEditorPane
, the pane fires a HyperLinkEvent
Responded to by any registered HyperLinkListener
objects
public interface HyperLinkListener {
public void hyperlinkUpdate(HyperlinkEvent e)
}
The public URL getURL()HyperlinkEvent
object passed to this method contains the
URL of the event which is returned by its getURL()
method:
HyperlinkEvents are fired not just when the user clicks the link but also
when the mouse enters or exits the link area. Thus you'll want to check the type
of the event before changing the page with the getEventType()
method:
public HyperlinkEvent.EventType getEventType()
This will return one of the three mnemonic constants
HyperlinkEvent.EventType.EXITED
HyperlinkEvent.EventType.ENTERED
HyperlinkEvent.EventType.ACTIVATED
import javax.swing.*;
import javax.swing.event.*;
public class LinkFollower implements HyperlinkListener {
private JEditorPane pane;
public LinkFollower(JEditorPane pane) {
this.pane = pane;
}
public void hyperlinkUpdate(HyperlinkEvent evt) {
if (evt.getEventType() == HyperlinkEvent.EventType.ACTIVATED) {
try {
pane.setPage(evt.getURL());
}
catch (Exception e) {
}
}
}
}
import javax.swing.text.*;
import javax.swing.*;
import java.io.*;
import java.awt.*;
public class SimpleWebBrowser {
public static void main(String[] args) {
// get the first URL
String initialPage = "http://metalab.unc.edu/javafaq/";
if (args.length > 0) initialPage = args[0];
// set up the editor pane
JEditorPane jep = new JEditorPane();
jep.setEditable(false);
jep.addHyperlinkListener(new LinkFollower(jep));
try {
jep.setPage(initialPage);
}
catch (IOException e) {
System.err.println("Usage: java SimpleWebBrowser url");
System.err.println(e);
System.exit(-1);
}
// set up the window
JScrollPane scrollPane = new JScrollPane(jep);
JFrame f = new JFrame("Simple Web Browser");
f.setDefaultCloseOperation(WindowConstants.DISPOSE_ON_CLOSE);
f.getContentPane().add(scrollPane);
f.setSize(512, 342);
f.show();
}
}
public void read(InputStream in, Object document) throws IOException
in
is the stream from which the HTML will be read.
document
should be an instance of
javax.swing.text.html.HTMLDocument
. (You can use another type, but
if you do the JEditorPane
will treat the stream as plain text
rather than HTML.)
Although you could use the HTMLDocument()
noargs
constructor to create the HTMLDocument
object, the document it
creates is missing a lot of style details.
Instead let a
javax.swing.text.html.HTMLEditorKit
create the document for you.
You get an EditorKit htmlKit = jep.getEditorKitForContentType("text/html");HTMLEditorKit
by passing the MIME type you want to edit
(text/html in this case) to the JEditorPane
getEditorKitForContentType()
method like this:
Finally, before reading from the stream you have to use the
jep.setEditorKit(htmlKit);JEditorPane
's setEditorKit()
method to install a
javax.swing.text.html.HTMLEditorKit
. For example,
JEditorPane jep = new JEditorPane();
jep.setEditable(false);
EditorKit htmlKit = jep.getEditorKitForContentType("text/html");
HTMLDocument doc = (HTMLDocument) htmlKit.createDefaultDocument();
jep.setEditorKit(htmlKit);
try {
URL u = new URL("http://www.macfaq.com");
InputStream in = u.openStream();
jep.read(in, doc);
}
catch (IOException e) {
System.err.println(e);
}
JScrollPane scrollPane = new JScrollPane(jep);
JFrame f = new JFrame("Macfaq");
f.setDefaultCloseOperation(WindowConstants.DISPOSE_ON_CLOSE);
f.getContentPane().add(scrollPane);
f.setSize(512, 342);
f.show();
Book tickers
Web Whacker
Search Engine or other robot
Many other uses
javax.swing.text.html
and javax.swing.text.html.parser
packages include classes that do most of the hard work for you.
They're primarily intended for the internal use of HotJava and the JEditorPane
class. Consequently they can be a little tricky to get at.
The HTML parsing class is the inner class
public abstract static class HTMLEditorKit.Parser extends Objectjavax.swing.html.HTMLEditorKit.Parser
:
The concrete subclass is
public class ParserDelegator extends HTMLEditorKit.Parserjavax.swing.text.html.parser.ParserDelegator
:
ParserDelegator
looks for five things in the document:
start tags
end tags
empty tags
text
comments
Every time the parser sees one of these five items, it invokes the corresponding
callback method in a particular instance of the
javax.swing.text.html.HTMLEditorKit.ParserCallback
class.
To parse an HTML file you write a subclass of
HTMLEditorKit.ParserCallback
that responds to text
and tags as you desire. Then you pass an instance of your
subclass to the HTMLEditorKit.Parser
's
parse()
method along with the Reader
from which the HTML will be read:
public void parse(Reader in, HTMLEditorKit.ParserCallback callback,
boolean ignoreCharacterSet) throws IOException
HTMLEditorKit.Parser
is an abstract class
so it can't be instantiated directly.
Its subclass, javax.swing.text.html.parser.ParserDelegator
, is concrete. However before you can use it you have to configure it with a DTD using the static methods setDefaultDTD()
and createDTD()
:
protected static void setDefaultDTD()
protected static DTD createDTD(DTD dtd, String name)
The DTD
class has a protected constructor, and many protected methods subclasses can use to build a DTD from scratch, but this is an API only an SGML expert could be expected to use.
Instead, read the HTML DTD from a text file using the DTDParser
class
But there is no DTDParser
class!
Use the protected HTMLEditorKit.Parser getParser()HTMLEditorKit.Parser.getParser()
method
instead which returns a ParserDelegator
after properly
initializing the DTD for HTML 3.2.
But this method is protected. Therefore subclass HTMLEditorKit
and override it with a public version.
import javax.swing.text.html.*;
public class ParserGetter extends HTMLEditorKit {
// purely to make this method public
public HTMLEditorKit.Parser getParser(){
return super.getParser();
}
}
public abstract void parse(Reader input, HTMLEditorKit.ParserCallback callback, boolean ignoreCharSet) throws IOException
The ParserCallback
class is a public inner class inside javax.swing.text.html.HTMLEditorKit
.
public static class HTMLEditorKit.ParserCallback extends Object
It has a single public, noargs constructor:
public HTMLEditorKit.ParserCallback()
However, you probably won't use this directly because the standard implementation of this class does nothing. It exists to be subclassed. It has six callback methods that do nothing. You will override these methods to respond to specific items seen in the input stream as the document is parsed.
public void handleText(char[] text, int position)
public void handleComment(char[] text, int position)
public void handleStartTag(HTML.Tag tag, MutableAttributeSet attributes, int position)
public void handleEndTag(HTML.Tag tag, int position)
public void handleSimpleTag(HTML.Tag tag, MutableAttributeSet attributes, int position)
public void handleError(String errorMsg, int position)
There's also a flush()
method you use to perform any final cleanup. The parser invokes this method once after it's finished parsing the document.
public void flush() throws BadLocationException
import javax.swing.text.html.*;
import java.io.*;
public class TagStripper extends HTMLEditorKit.ParserCallback {
private Writer out;
public TagStripper(Writer out) {
this.out = out;
}
public void handleText(char[] text, int position) {
try {
out.write(text);
out.flush();
}
catch (IOException e) {
System.err.println(e);
}
}
}
// Begin by retrieving a parser using the ParserGetter class:
ParserGetter kit = new ParserGetter();
HTMLEditorKit.Parser parser = kit.getParser();
// Next, construct an instance of your callback class like this:
HTMLEditorKit.ParserCallback callback
= new TagStripper(new OutputStreamWriter(System.out));
// Then get a stream you can read the HTML document from. For example,
try {
URL u = new URL("http://www.oreilly.com");
InputStream in = u.openStream();
InputStreamReader r = new InputStreamReader(in);
// Finally, pass the Reader and HTMLEditorKit.ParserCallback to the
// HTMLEditorKit.Parser's parse() method, like this:
parser.parse(r, callback, false);
}
catch (IOException e) {
System.err.println(e);
}
Parsing takes place in a separate thread
White space is stripped.
<H1> Here's the Title </H1>
<P> Here's the text </P>
What actually comes out of the tag stripper is:
Here's the TitleHere's the text
The single exception is the PRE
element which maintains all
white space in its contents unedited.
Short of implementing your own parser, I don't know of any way to retain all the stripped space. But you can include the minimum necessary line breaks and white space by looking at the tags as well as the text. Generally you expect a single break in HTML when you see one of these tags:
<BR>
<LI>
<TR>
You expect a double break (paragraph break) when you see one of these tags:
<P>
</H1> </H2> </H3> </H4> </H5> </H6>
<HR>
<DIV>
</UL> </OL> </DL>
To include line breaks in the output you have to look at each tag as it's
processed and determine whether it falls in one of these sets. This is straight-
forward because the first argument passed to each of the tag callback methods is
an HTML.Tag
object.
HTML.Tag
is a public inner class in the
javax.swing.text.html.HTML
class.
public static class HTML.Tag extends Object
It has these four methods:
public boolean isBlock()
public boolean breaksFlow()
public boolean isPreformatted()
public String toString()
The breaksFlow()
method returns true if the tag should cause a
single line break.
The isBlock()
method returns true if the tag
should cause a double line break.
The isPreformatted()
method
returns true if the tag indicates that white space should be preserved.
import javax.swing.text.*;
import javax.swing.text.html.*;
import javax.swing.text.html.parser.*;
import java.io.*;
import java.net.*;
public class LineBreakingTagStripper extends HTMLEditorKit.ParserCallback {
private Writer out;
private String lineSeparator;
public LineBreakingTagStripper(Writer out) {
this(out, System.getProperty("line.separator", "\r\n"));
}
public LineBreakingTagStripper(Writer out, String lineSeparator) {
this.out = out;
this.lineSeparator = lineSeparator;
}
public void handleText(char[] text, int position) {
try {
out.write(text);
out.flush();
}
catch (IOException e) {
System.err.println(e);
}
}
public void handleEndTag(HTML.Tag tag, int position) {
try {
if (tag.isBlock()) {
out.write(lineSeparator);
out.write(lineSeparator);
}
else if (tag.breaksFlow()) {
out.write(lineSeparator);
}
}
catch (IOException e) {
System.err.println(e);
}
}
public void handleSimpleTag(HTML.Tag tag, MutableAttributeSet attributes,
int position) {
try {
if (tag.isBlock()) {
out.write(lineSeparator);
out.write(lineSeparator);
}
else if (tag.breaksFlow()) {
out.write(lineSeparator);
}
else {
out.write(' ');
}
}
catch (IOException e) {
System.err.println(e);
}
}
}
You determine the type of a tag by comparing it against these 73 mnemonic
constants from the HTML.Tag
class:
HTML.Tag.A
HTML.Tag.ADDRESS
HTML.Tag.APPLET
HTML.Tag.AREA
HTML.Tag.B
HTML.Tag.BASE
HTML.Tag.BASEFONT
HTML.Tag.BIG
HTML.Tag.BLOCKQUOTE
HTML.Tag.BODY
HTML.Tag.BR
HTML.Tag.CAPTION
HTML.Tag.CENTER
HTML.Tag.CITE
HTML.Tag.CODE
HTML.Tag.DD
HTML.Tag.DFN
HTML.Tag.DIR
HTML.Tag.DIV
HTML.Tag.DL
HTML.Tag.DT
HTML.Tag.EM
HTML.Tag.FONT
HTML.Tag.FORM
HTML.Tag.FRAME
HTML.Tag.FRAMESET
HTML.Tag.H1
HTML.Tag.H2
HTML.Tag.H3
HTML.Tag.H4
HTML.Tag.H5
HTML.Tag.H6
HTML.Tag.HEAD
HTML.Tag.HR
HTML.Tag.HTML
HTML.Tag.I
HTML.Tag.IMG
HTML.Tag.INPUT
HTML.Tag.ISINDEX
HTML.Tag.KBD
HTML.Tag.LI
HTML.Tag.LINK
HTML.Tag.MAP
HTML.Tag.MENU
HTML.Tag.META
HTML.Tag.NOFRAMES
HTML.Tag.OBJECT
HTML.Tag.OL
HTML.Tag.OPTION
HTML.Tag.P
HTML.Tag.PARAM
HTML.Tag.PRE
HTML.Tag.SAMP
HTML.Tag.SCRIPT
HTML.Tag.SELECT
HTML.Tag.SMALL
HTML.Tag.STRIKE
HTML.Tag.S
HTML.Tag.STRONG
HTML.Tag.STYLE
HTML.Tag.SUB
HTML.Tag.SUP
HTML.Tag.TABLE
HTML.Tag.TD
HTML.Tag.TEXTAREA
HTML.Tag.TH
HTML.Tag.TR
HTML.Tag.TT
HTML.Tag.U
HTML.Tag.UL
HTML.Tag.VAR
HTML.Tag.IMPLIED
HTML.Tag.COMMENT
import javax.swing.text.*;
import javax.swing.text.html.*;
import javax.swing.text.html.parser.*;
import java.io.*;
import java.net.*;
import java.util.*;
public class Outliner extends HTMLEditorKit.ParserCallback {
private Writer out;
private int level = 0;
private boolean inHeader=false;
private static String lineSeparator
= System.getProperty("line.separator", "\r\n");
public Outliner(Writer out) {
this.out = out;
}
public void handleStartTag(HTML.Tag tag,
MutableAttributeSet attributes, int position) {
int newLevel = 0;
if (tag == HTML.Tag.H1) newLevel = 1;
else if (tag == HTML.Tag.H2) newLevel = 2;
else if (tag == HTML.Tag.H3) newLevel = 3;
else if (tag == HTML.Tag.H4) newLevel = 4;
else if (tag == HTML.Tag.H5) newLevel = 5;
else if (tag == HTML.Tag.H6) newLevel = 6;
else return;
this.inHeader = true;
try {
if (newLevel > this.level) {
for (int i =0; i < newLevel-this.level; i++) {
out.write("<ul>" + lineSeparator + "<li>");
}
}
else if (newLevel < this.level) {
for (int i =0; i < this.level-newLevel; i++) {
out.write(lineSeparator + "</ul>" + lineSeparator);
}
out.write(lineSeparator + "<li>");
}
else {
out.write(lineSeparator + "<li>");
}
this.level = newLevel;
out.flush();
}
catch (IOException e) {
System.err.println(e);
}
}
public void handleEndTag(HTML.Tag tag, int position) {
if (tag == HTML.Tag.H1 || tag == HTML.Tag.H2
|| tag == HTML.Tag.H3 || tag == HTML.Tag.H4
|| tag == HTML.Tag.H5 || tag == HTML.Tag.H6) {
inHeader = false;
}
// work around bug in the parser that fails to call flush
if (tag == HTML.Tag.HTML) this.flush();
}
public void handleText(char[] text, int position) {
if (inHeader) {
try {
out.write(text);
out.flush();
}
catch (IOException e) {
System.err.println(e);
}
}
}
public void flush() {
try {
while (this.level-- > 0) {
out.write(lineSeparator + "</ul>");
}
out.flush();
}
catch (IOException e) {
System.err.println(e);
}
}
public static void main(String[] args) {
ParserGetter kit = new ParserGetter();
HTMLEditorKit.Parser parser = kit.getParser();
try {
URL u = new URL(args[0]);
InputStream in = u.openStream();
InputStreamReader r = new InputStreamReader(in);
HTMLEditorKit.ParserCallback callback = new Outliner
(new OutputStreamWriter(System.out));
parser.parse(r, callback, false);
}
catch (IOException e) {
System.err.println(e);
}
catch (ArrayIndexOutOfBoundsException e) {
System.out.println("Usage: java Outliner url");
}
}
}
D:\JAVA\JNP2\examples\08>java Outliner http://metalab.unc.edu/xml/ <ul> <li> Cafe con Leche XML News and Resources<ul> <li><ul> <li>XML Overview <li>Random Notes <li>Specifications <li>Books <li>XML Resources <li>Development Tools<ul> <li>Validating Parsers <li>Non-validating Parsers <li>Online Validators and Syntax Checkers <li>Formatting Engines <li>Browsers <li>Class Libraries <li>Editors <li>XLL <li>XML Applications <li>External Sites </ul> </ul> <li>Quote of the Day <li>Today's News <li>Recommended Reading <li>Recent News</ul> </ul>
The second argument to the handleStartTag()
and
handleSimpletag()
callback methods is an instance of the
javax.swing.text.MutableAttributeSet
class which allows you to see
what attributes are attached to a particular tag.
public abstract interface MutableAttributeSet extends
AttributeSet
The AttributeSet
interface declares these methods:
public int getAttributeCount()
public boolean isDefined(Object name)
public boolean containsAttribute(Object name, Object value)
public boolean containsAttributes(AttributeSet attributes)
public boolean isEqual(AttributeSet attributes)
public AttributeSet copyAttributes()
public Enumeration getAttributeNames()
public Object getAttribute(Object name)
public AttributeSet getResolveParent()
Given an AttributeSet
this method prints the
attributes in name=value format:
private void listAttributes(AttributeSet attributes) {
Enumeration e = attributes.getAttributeNames();
while (e.hasMoreElements()) {
Object name = e.nextElement();
Object value = attributes.getAttribute(name);
System.out.println(name + "=" + value);
}
}
Although the argument and return types of these methods are mostly declared
in terms of java.lang.Object
, in practice all values are instances
of java.lang.String
while all names are instances of the public
inner class javax.swing.text.html.HTML.Attribute
.
HTML.Attribute.ACTION
HTML.Attribute.ALIGN
HTML.Attribute.ALINK
HTML.Attribute.ALT
HTML.Attribute.ARCHIVE
HTML.Attribute.BACKGROUND
HTML.Attribute.BGCOLOR
HTML.Attribute.BORDER
HTML.Attribute.CELLPADDING
HTML.Attribute.CELLSPACING
HTML.Attribute.CHECKED
HTML.Attribute.CLASS
HTML.Attribute.CLASSID
HTML.Attribute.CLEAR
HTML.Attribute.CODE
HTML.Attribute.CODEBASE
HTML.Attribute.CODETYPE
HTML.Attribute.COLOR
HTML.Attribute.COLS
HTML.Attribute.COLSPAN
HTML.Attribute.COMMENT
HTML.Attribute.COMPACT
HTML.Attribute.CONTENT
HTML.Attribute.COORDS
HTML.Attribute.DATA
HTML.Attribute.DECLARE
HTML.Attribute.DIR
HTML.Attribute.DUMMY
HTML.Attribute.ENCTYPE
HTML.Attribute.ENDTAG
HTML.Attribute.FACE
HTML.Attribute.FRAMEBORDER
HTML.Attribute.HALIGN
HTML.Attribute.HEIGHT
HTML.Attribute.HREF
HTML.Attribute.HSPACE
HTML.Attribute.HTTPEQUIV
HTML.Attribute.ID
HTML.Attribute.ISMAP
HTML.Attribute.LANG
HTML.Attribute.LANGUAGE
HTML.Attribute.LINK
HTML.Attribute.LOWSRC
HTML.Attribute.MARGINHEIGHT
HTML.Attribute.MARGINWIDTH
HTML.Attribute.MAXLENGTH
HTML.Attribute.METHOD
HTML.Attribute.MULTIPLE
HTML.Attribute.N
HTML.Attribute.NAME
HTML.Attribute.NOHREF
HTML.Attribute.NORESIZE
HTML.Attribute.NOSHADE
HTML.Attribute.NOWRAP
HTML.Attribute.PROMPT
HTML.Attribute.REL
HTML.Attribute.REV
HTML.Attribute.ROWS
HTML.Attribute.ROWSPAN
HTML.Attribute.SCROLLING
HTML.Attribute.SELECTED
HTML.Attribute.SHAPE
HTML.Attribute.SHAPES
HTML.Attribute.SIZE
HTML.Attribute.SRC
HTML.Attribute.STANDBY
HTML.Attribute.START
HTML.Attribute.STYLE
HTML.Attribute.TARGET
HTML.Attribute.TEXT
HTML.Attribute.TITLE
HTML.Attribute.TYPE
HTML.Attribute.USEMAP
HTML.Attribute.VALIGN
HTML.Attribute.VALUE
HTML.Attribute.VALUETYPE
HTML.Attribute.VERSION
HTML.Attribute.VLINK
HTML.Attribute.VSPACE
HTML.Attribute.WIDTH
The MutableAttributeSet
interface adds six methods to add and remove attributes from the set:
public void addAttribute(Object name, Object value)
public void addAttributes(AttributeSet attributes)
public void removeAttribute(Object name)
public void removeAttributes(Enumeration names)
public void removeAttributes(AttributeSet attributes)
public void setResolveParent(AttributeSet parent)
Again the values are strings and the names are HTML.Attribute
objects.
import javax.swing.text.*;
import javax.swing.text.html.*;
import javax.swing.text.html.parser.*;
import java.io.*;
import java.net.*;
import java.util.*;
public class PageSaver extends HTMLEditorKit.ParserCallback {
private Writer out;
private URL base;
public PageSaver(Writer out, URL base) {
this.out = out;
this.base = base;
}
public void handleStartTag(HTML.Tag tag,
MutableAttributeSet attributes, int position) {
try {
out.write("<" + tag);
this.writeAttributes(attributes);
// for the <APPLET> tag we may have to add a codebase attribute
if (tag == HTML.Tag.APPLET
&& attributes.getAttribute(HTML.Attribute.CODEBASE) == null) {
String codebase = base.toString();
if (codebase.endsWith(".htm") || codebase.endsWith(".html")) {
codebase = codebase.substring(0, codebase.lastIndexOf('/'));
}
out.write(" codebase=\"" + codebase + "\"");
}
out.write(">");
out.flush();
}
catch (IOException e) {
System.err.println(e);
e.printStackTrace();
}
}
public void handleEndTag(HTML.Tag tag, int position) {
try {
out.write("</" + tag + ">");
out.flush();
}
catch (IOException e) {
System.err.println(e);
}
}
private void writeAttributes(AttributeSet attributes)
throws IOException {
Enumeration e = attributes.getAttributeNames();
while (e.hasMoreElements()) {
Object name = e.nextElement();
String value = (String) attributes.getAttribute(name);
try {
if (name == HTML.Attribute.HREF || name == HTML.Attribute.SRC
|| name == HTML.Attribute.LOWSRC
|| name == HTML.Attribute.CODEBASE ) {
URL u = new URL(base, value);
out.write(" " + name + "=\"" + u + "\"");
}
else {
out.write(" " + name + "=\"" + value + "\"");
}
}
catch (MalformedURLException ex) {
System.err.println(ex);
System.err.println(base);
System.err.println(value);
ex.printStackTrace();
}
}
}
public void handleComment(char[] text, int position) {
try {
out.write("<!-- ");
out.write(text);
out.write(" -->");
out.flush();
}
catch (IOException e) {
System.err.println(e);
}
}
public void handleText(char[] text, int position) {
try {
out.write(text);
out.flush();
}
catch (IOException e) {
System.err.println(e);
e.printStackTrace();
}
}
public void handleSimpleTag(HTML.Tag tag,
MutableAttributeSet attributes, int position) {
try {
out.write("<" + tag);
this.writeAttributes(attributes);
out.write(">");
}
catch (IOException e) {
System.err.println(e);
e.printStackTrace();
}
}
public static void main(String[] args) {
for (int i = 0; i < args.length; i++) {
ParserGetter kit = new ParserGetter();
HTMLEditorKit.Parser parser = kit.getParser();
try {
URL u = new URL(args[i]);
InputStream in = u.openStream();
InputStreamReader r = new InputStreamReader(in);
String remoteFileName = u.getFile();
if (remoteFileName.endsWith("/")) {
remoteFileName += "index.html";
}
if (remoteFileName.startsWith("/")) {
remoteFileName = remoteFileName.substring(1);
}
File localDirectory = new File(u.getHost());
while (remoteFileName.indexOf('/') > -1) {
String part = remoteFileName.substring(0, remoteFileName.indexOf('/'));
remoteFileName = remoteFileName.substring(remoteFileName.indexOf('/')+1);
localDirectory = new File(localDirectory, part);
}
if (localDirectory.mkdirs()) {
File output = new File(localDirectory, remoteFileName);
FileWriter out = new FileWriter(output);
HTMLEditorKit.ParserCallback callback = new PageSaver(out, u);
parser.parse(r, callback, false);
}
}
catch (IOException e) {
System.err.println(e);
e.printStackTrace();
}
}
}
}
Java Network Programming, Second Edition
Elliotte Rusty Harold
O'Reilly & Associates, 2000
ISBN: 1-565-92870-9
This presentation: http://metalab.unc.edu/javafaq/slides/sd2000east/webclient/