Retaining Line Breaks

Short of implementing your own parser, I don't know of any way to retain all the stripped space. But you can include the minimum necessary line breaks and white space by looking at the tags as well as the text. Generally you expect a single break in HTML when you see one of these tags:

<BR>
<LI>
<TR>

You expect a double break (paragraph break) when you see one of these tags:

<P>
</H1> </H2> </H3> </H4> </H5> </H6>
<HR>
<DIV>
</UL> </OL> </DL>

To include line breaks in the output you have to look at each tag as it's processed and determine whether it falls in one of these sets. This is straight- forward because the first argument passed to each of the tag callback methods is an HTML.Tag object.


Previous | Next | Top | Cafe au Lait

Copyright 2000 Elliotte Rusty Harold
elharo@metalab.unc.edu
Last Modified January 28, 2000