⇒ Index (Frames) |  ⇒ Index (No Frames) |  ⇒ Package |  ⇒ Package Tree |  ⇒ Full Tree 
gnu.xml.util

Class XMLWriter

java.lang.Object
|
+--gnu.xml.util.XMLWriter

All Implemented Interfaces:
ContentHandler, DTDHandler, DeclHandler, LexicalHandler

Known Direct Subclasses:
TextConsumer, XHTMLWriter


public class XMLWriter
extends java.lang.Object
implements ContentHandler, LexicalHandler, DTDHandler, DeclHandler

This class is a SAX handler which writes all its input as a well formed XML or XHTML document. If driven using SAX2 events, this output may include a recreated document type declaration, subject to limitations of SAX (no internal subset exposed) or DOM (the important declarations, with their documentation, are discarded).

By default, text is generated "as-is", but some optional modes are supported. Pretty-printing is supported, to make life easier for people reading the output. XHTML (1.0) output has can be made particularly pretty; all the built-in character entities are known. Canonical XML can also be generated, assuming the input is properly formed.


Some of the methods on this class are intended for applications to use directly, rather than as pure SAX2 event callbacks. Some of those methods access the JavaBeans properties (used to tweak output formats, for example canonicalization and pretty printing). Subclasses are expected to add new behaviors, not to modify current behavior, so many such methods are final.

The write*() methods may be slightly simpler for some applications to use than direct callbacks. For example, they support a simple policy for encoding data items as the content of a single element.

To reuse an XMLWriter you must provide it with a new Writer, since this handler closes the writer it was given as part of its endDocument() handling. (XML documents have an end of input, and the way to encode that on a stream is to close it.)


Note that any relative URIs in the source document, as found in entity and notation declarations, ought to have been fully resolved by the parser providing events to this handler. This means that the output text should only have fully resolved URIs, which may not be the desired behavior in cases where later binding is desired.

Note that due to SAX2 defaults, you may need to manually ensure that the input events are XML-conformant with respect to namespace prefixes and declarations. gnu.xml.pipeline.NSFilter is one solution to this problem, in the context of processing pipelines. Something as simple as connecting this handler to a parser might not generate the correct output. Another workaround is to ensure that the namespace-prefixes feature is always set to true, if you're hooking this directly up to some XMLReader implementation.

Author:
David Brownell
See Also:
gnu.xml.pipeline.TextConsumer

Constructor Summary

XMLWriter()

Constructs this handler with System.out used to write SAX events using the UTF-8 encoding.

XMLWriter(java.io.OutputStream out)

Constructs a handler which writes all input to the output stream in the UTF-8 encoding, and closes it when endDocument is called.

XMLWriter(java.io.Writer writer)

Constructs a handler which writes all input to the writer, and then closes the writer when the document ends.

XMLWriter(java.io.Writer writer, java.lang.String encoding)

Constructs a handler which writes all input to the writer, and then closes the writer when the document ends.

Method Summary

void

attributeDecl(java.lang.String eName, java.lang.String aName, java.lang.String type, java.lang.String mode, java.lang.String value)

SAX2: called on attribute declarations

void

characters(char ch[] , int start, int length)

SAX1: reports content characters

void

comment(char ch[] , int start, int length)

SAX2: called when comments are parsed.

void

elementDecl(java.lang.String name, java.lang.String model)

SAX2: called on element declarations

void

endCDATA()

SAX2: called after parsing CDATA characters

void

endDocument()

SAX1: indicates the completion of a parse.

void

endDTD()

SAX2: called after the doctype is parsed

void

endElement(java.lang.String uri, java.lang.String localName, java.lang.String qName)

SAX2: indicates the end of an element

void

endEntity(java.lang.String name)

SAX2: called after parsing a general entity in content

void

endPrefixMapping(java.lang.String prefix)

SAX2: ignored.

void

externalEntityDecl(java.lang.String name, java.lang.String publicId, java.lang.String systemId)

SAX2: called on external entity declarations

void

fatal(java.lang.String message, java.lang.Exception e)

Used internally and by subclasses, this encapsulates the logic involved in reporting fatal errors.

void

flush()

Flushes the output stream.

void

ignorableWhitespace(char ch[] , int start, int length)

SAX1: reports ignorable whitespace

void

internalEntityDecl(java.lang.String name, java.lang.String value)

SAX2: called on internal entity declarations

boolean

isCanonical()

Returns value of flag controlling canonical output.

boolean

isExpandingEntities()

Returns true if the output will have no entity references; returns false (the default) otherwise.

boolean

isPrettyPrinting()

Returns value of flag controlling pretty printing.

boolean

isXhtml()

Returns true if the output attempts to echo the input following "transitional" XHTML rules and matching the "HTML Compatibility Guidelines" so that an HTML version 3 browser can read the output as HTML; returns false (the default) othewise.

void

notationDecl(java.lang.String name, java.lang.String publicId, java.lang.String systemId)

SAX1: called on notation declarations

void

processingInstruction(java.lang.String target, java.lang.String data)

SAX1: reports a PI.

void

setCanonical(boolean value)

Sets the output style to be canonicalized.

void

setDocumentLocator(Locator l)

SAX1: provides parser status information

void

setEOL(java.lang.String eolString)

Assigns the line ending style to be used on output.

void

setErrorHandler(ErrorHandler handler)

Assigns the error handler to be used to present most fatal errors.

void

setExpandingEntities(boolean value)

Controls whether the output text contains references to entities (the default), or instead contains the expanded values of those entities.

void

setPrettyPrinting(boolean value)

Controls pretty-printing, which by default is not enabled (and currently is most useful for XHTML output).

void

setWriter(java.io.Writer writer, java.lang.String encoding)

Resets the handler to write a new text document.

void

setXhtml(boolean value)

Controls whether the output should attempt to follow the "transitional" XHTML rules so that it meets the "HTML Compatibility Guidelines" appendix in the XHTML specification.

void

skippedEntity(java.lang.String name)

SAX1: indicates a non-expanded entity reference

void

startCDATA()

SAX2: called before parsing CDATA characters

void

startDocument()

SAX1: indicates the beginning of a document parse.

void

startDTD(java.lang.String name, java.lang.String publicId, java.lang.String systemId)

SAX2: called when the doctype is partially parsed Note that this, like other doctype related calls, is ignored when XHTML is in use.

void

startElement(java.lang.String uri, java.lang.String localName, java.lang.String qName, Attributes atts)

SAX2: indicates the start of an element.

void

startEntity(java.lang.String name)

SAX2: called before parsing a general entity in content

void

startPrefixMapping(java.lang.String prefix, java.lang.String uri)

SAX2: ignored.

void

unparsedEntityDecl(java.lang.String name, java.lang.String publicId, java.lang.String systemId, java.lang.String notationName)

SAX1: called on unparsed entity declarations

void

write(java.lang.String data)

Writes the string as if characters() had been called on the contents of the string.

void

writeElement(java.lang.String uri, java.lang.String localName, java.lang.String qName, Attributes atts, java.lang.String content)

Writes an element that has content consisting of a single string.

void

writeElement(java.lang.String uri, java.lang.String localName, java.lang.String qName, Attributes atts, int content)

Writes an element that has content consisting of a single integer, encoded as a decimal string.

void

writeEmptyElement(java.lang.String uri, java.lang.String localName, java.lang.String qName, Attributes atts)

Writes an empty element.

Constructor Details

XMLWriter

public XMLWriter()

Constructs this handler with System.out used to write SAX events using the UTF-8 encoding. Avoid using this except when you know it's safe to close System.out at the end of the document.


XMLWriter

public XMLWriter(java.io.OutputStream out)

Constructs a handler which writes all input to the output stream in the UTF-8 encoding, and closes it when endDocument is called. (Yes it's annoying that this throws an exception -- but there's really no way around it, since it's barely possible a JDK may exist somewhere that doesn't know how to emit UTF-8.)

Parameters:
out

XMLWriter

public XMLWriter(java.io.Writer writer, java.lang.String encoding)

Constructs a handler which writes all input to the writer, and then closes the writer when the document ends. If an XML declaration is written onto the output, this class will use the specified encoding name in that declaration. If no encoding name is specified, no encoding name will be declared unless this class can otherwise determine the name of the character encoding for this writer.

At this time, only the UTF-8 ("UTF8") and UTF-16 ("Unicode") output encodings are fully lossless with respect to XML data. If you use any other encoding you risk having your data be silently mangled on output, as the standard Java character encoding subsystem silently maps non-encodable characters to a question mark ("?") and will not report such errors to applications.

For a few other encodings the risk can be reduced. If the writer is a java.io.OutputStreamWriter, and uses either the ISO-8859-1 ("8859_1", "ISO8859_1", etc) or US-ASCII ("ASCII") encodings, content which can't be encoded in those encodings will be written safely. Where relevant, the XHTML entity names will be used; otherwise, numeric character references will be emitted.

However, there remain a number of cases where substituting such entity or character references is not an option. Such references are not usable within a DTD, comment, PI, or CDATA section. Neither may they be used when element, attribute, entity, or notation names have the problematic characters.

Parameters:
writer - XML text is written to this writer.
encoding - if non-null, and an XML declaration is written, this is the name that will be used for the character encoding.

XMLWriter

public XMLWriter(java.io.Writer writer)

Constructs a handler which writes all input to the writer, and then closes the writer when the document ends. If an XML declaration is written onto the output, and this class can determine the name of the character encoding for this writer, that encoding name will be included in the XML declaration.

See the description of the constructor which takes an encoding name for imporant information about selection of encodings.

Parameters:
writer - XML text is written to this writer.

Method Details

attributeDecl

public final void attributeDecl(java.lang.String eName, java.lang.String aName, java.lang.String type, java.lang.String mode, java.lang.String value)

SAX2: called on attribute declarations

Parameters:
eName
aName
type
mode
value

characters

public final void characters(char ch[] , int start, int length)

SAX1: reports content characters

Parameters:
start
length

comment

public final void comment(char ch[] , int start, int length)

SAX2: called when comments are parsed. When XHTML is used, the old HTML tradition of using comments to for inline CSS, or for JavaScript code is discouraged. This is because XML processors are encouraged to discard, on the grounds that comments are for users (and perhaps text editors) not programs. Instead, use external scripts

Parameters:
start
length

elementDecl

public final void elementDecl(java.lang.String name, java.lang.String model)

SAX2: called on element declarations

Parameters:
name
model

endCDATA

public final void endCDATA()

SAX2: called after parsing CDATA characters


endDocument

public void endDocument()

SAX1: indicates the completion of a parse. Note that all complete SAX event streams make this call, even if an error is reported during a parse.


endDTD

public final void endDTD()

SAX2: called after the doctype is parsed


endElement

public final void endElement(java.lang.String uri, java.lang.String localName, java.lang.String qName)

SAX2: indicates the end of an element

Parameters:
uri
localName
qName

endEntity

public final void endEntity(java.lang.String name)

SAX2: called after parsing a general entity in content

Parameters:
name

endPrefixMapping

public final void endPrefixMapping(java.lang.String prefix)

SAX2: ignored.

Parameters:
prefix

externalEntityDecl

public final void externalEntityDecl(java.lang.String name, java.lang.String publicId, java.lang.String systemId)

SAX2: called on external entity declarations

Parameters:
name
publicId
systemId

fatal

protected void fatal(java.lang.String message, java.lang.Exception e)

Used internally and by subclasses, this encapsulates the logic involved in reporting fatal errors. It uses locator information for good diagnostics, if available, and gives the application's ErrorHandler the opportunity to handle the error before throwing an exception.

Parameters:
message
e

flush

public final void flush()

Flushes the output stream. When this handler is used in long lived pipelines, it can be important to flush buffered state, for example so that it can reach the disk as part of a state checkpoint.


ignorableWhitespace

public final void ignorableWhitespace(char ch[] , int start, int length)

SAX1: reports ignorable whitespace

Parameters:
start
length

internalEntityDecl

public final void internalEntityDecl(java.lang.String name, java.lang.String value)

SAX2: called on internal entity declarations

Parameters:
name
value

isCanonical

public final boolean isCanonical()

Returns value of flag controlling canonical output.


isExpandingEntities

public final boolean isExpandingEntities()

Returns true if the output will have no entity references; returns false (the default) otherwise.


isPrettyPrinting

public final boolean isPrettyPrinting()

Returns value of flag controlling pretty printing.


isXhtml

public final boolean isXhtml()

Returns true if the output attempts to echo the input following "transitional" XHTML rules and matching the "HTML Compatibility Guidelines" so that an HTML version 3 browser can read the output as HTML; returns false (the default) othewise.


notationDecl

public final void notationDecl(java.lang.String name, java.lang.String publicId, java.lang.String systemId)

SAX1: called on notation declarations

Parameters:
name
publicId
systemId

processingInstruction

public final void processingInstruction(java.lang.String target, java.lang.String data)

SAX1: reports a PI. This doesn't check for illegal target names, such as "xml" or "XML", or namespace-incompatible ones like "big:dog"; the caller is responsible for ensuring those names are legal.

Parameters:
target
data

setCanonical

public final void setCanonical(boolean value)

Sets the output style to be canonicalized. Input events must meet requirements that are slightly more stringent than the basic well-formedness ones, and include:

Note that fragments of XML documents, as specified by an XPath node set, may be canonicalized. In such cases, elements may need some fixup (for xml:* attributes and application-specific context).

Parameters:
value
Throws:
java.lang.IllegalArgumentException - if the output encoding is anything other than UTF-8.

setDocumentLocator

public final void setDocumentLocator(Locator l)

SAX1: provides parser status information

Parameters:
l

setEOL

public final void setEOL(java.lang.String eolString)

Assigns the line ending style to be used on output.

Parameters:
eolString - null to use the system default; else "\n", "\r", or "\r\n".

setErrorHandler

public void setErrorHandler(ErrorHandler handler)

Assigns the error handler to be used to present most fatal errors.

Parameters:
handler

setExpandingEntities

public final void setExpandingEntities(boolean value)

Controls whether the output text contains references to entities (the default), or instead contains the expanded values of those entities.

Parameters:
value

setPrettyPrinting

public final void setPrettyPrinting(boolean value)

Controls pretty-printing, which by default is not enabled (and currently is most useful for XHTML output). Pretty printing enables structural indentation, sorting of attributes by name, line wrapping, and potentially other mechanisms for making output more or less readable.

At this writing, structural indentation and line wrapping are enabled when pretty printing is enabled and the xml:space attribute has the value default (its other legal value is preserve, as defined in the XML specification). The three XHTML element types which use another value are recognized by their names (namespaces are ignored).

Also, for the record, the "pretty" aspect of printing here is more to provide basic structure on outputs that would otherwise risk being a single long line of text. For now, expect the structure to be ragged ... unless you'd like to submit a patch to make this be more strictly formatted!

Parameters:
value
Throws:
java.lang.IllegalStateException - thrown if this method is invoked after output has begun.

setWriter

public final void setWriter(java.io.Writer writer, java.lang.String encoding)

Resets the handler to write a new text document.

Parameters:
writer - XML text is written to this writer.
encoding - if non-null, and an XML declaration is written, this is the name that will be used for the character encoding.
Throws:
java.lang.IllegalStateException - if the current document hasn't yet ended (with #endDocument)

setXhtml

public final void setXhtml(boolean value)

Controls whether the output should attempt to follow the "transitional" XHTML rules so that it meets the "HTML Compatibility Guidelines" appendix in the XHTML specification. A "transitional" Document Type Declaration (DTD) is placed near the beginning of the output document, instead of whatever DTD would otherwise have been placed there, and XHTML empty elements are printed specially. When writing text in US-ASCII or ISO-8859-1 encodings, the predefined XHTML internal entity names are used (in preference to character references) when writing content characters which can't be expressed in those encodings.

When this option is enabled, it is the caller's responsibility to ensure that the input is otherwise valid as XHTML. Things to be careful of in all cases, as described in the appendix referenced above, include:

Additionally, some of the oldest browsers have additional quirks, to address with guidelines such as:

Also, some characteristics of the resulting output may be a function of whether the document is later given a MIME content type of text/html rather than one indicating XML (application/xml or text/xml). Worse, some browsers ignore MIME content types and prefer to rely URI name suffixes -- so an "index.xml" could always be XML, never XHTML, no matter its MIME type.

Parameters:
value

skippedEntity

public void skippedEntity(java.lang.String name)

SAX1: indicates a non-expanded entity reference

Parameters:
name

startCDATA

public final void startCDATA()

SAX2: called before parsing CDATA characters


startDocument

public void startDocument()

SAX1: indicates the beginning of a document parse. If you're writing (well formed) fragments of XML, neither this nor endDocument should be called.


startDTD

public final void startDTD(java.lang.String name, java.lang.String publicId, java.lang.String systemId)

SAX2: called when the doctype is partially parsed Note that this, like other doctype related calls, is ignored when XHTML is in use.

Parameters:
name
publicId
systemId

startElement

public final void startElement(java.lang.String uri, java.lang.String localName, java.lang.String qName, Attributes atts)

SAX2: indicates the start of an element. When XHTML is in use, avoid attribute values with line breaks or multiple whitespace characters, since not all user agents handle them correctly.

Parameters:
uri
localName
qName
atts

startEntity

public final void startEntity(java.lang.String name)

SAX2: called before parsing a general entity in content

Parameters:
name

startPrefixMapping

public final void startPrefixMapping(java.lang.String prefix, java.lang.String uri)

SAX2: ignored.

Parameters:
prefix
uri

unparsedEntityDecl

public final void unparsedEntityDecl(java.lang.String name, java.lang.String publicId, java.lang.String systemId, java.lang.String notationName)

SAX1: called on unparsed entity declarations

Parameters:
name
publicId
systemId
notationName

write

public final void write(java.lang.String data)

Writes the string as if characters() had been called on the contents of the string. This is particularly useful when applications act as producers and write data directly to event consumers.

Parameters:
data

writeElement

public void writeElement(java.lang.String uri, java.lang.String localName, java.lang.String qName, Attributes atts, int content)

Writes an element that has content consisting of a single integer, encoded as a decimal string.

Parameters:
uri
localName
qName
atts
content
See Also:
writeEmptyElement
startElement

writeElement

public void writeElement(java.lang.String uri, java.lang.String localName, java.lang.String qName, Attributes atts, java.lang.String content)

Writes an element that has content consisting of a single string.

Parameters:
uri
localName
qName
atts
content
See Also:
writeEmptyElement
startElement

writeEmptyElement

public void writeEmptyElement(java.lang.String uri, java.lang.String localName, java.lang.String qName, Attributes atts)

Writes an empty element.

Parameters:
uri
localName
qName
atts
See Also:
startElement