gnu.xml.util
Class XMLWriter
java.lang.Object
|
+--gnu.xml.util.XMLWriter
All Implemented Interfaces:
ContentHandler, DTDHandler, DeclHandler, LexicalHandlerKnown Direct Subclasses:
TextConsumer, XHTMLWriter
This class is a SAX handler which writes all its input as a well formed
XML or XHTML document. If driven using SAX2 events, this output may
include a recreated document type declaration, subject to limitations
of SAX (no internal subset exposed) or DOM (the important declarations,
with their documentation, are discarded).
By default, text is generated "as-is", but some optional modes
are supported. Pretty-printing is supported, to make life easier
for people reading the output. XHTML (1.0) output has can be made
particularly pretty; all the built-in character entities are known.
Canonical XML can also be generated, assuming the input is properly
formed.
Some of the methods on this class are intended for applications to
use directly, rather than as pure SAX2 event callbacks. Some of those
methods access the JavaBeans properties (used to tweak output formats,
for example canonicalization and pretty printing). Subclasses
are expected to add new behaviors, not to modify current behavior, so
many such methods are final.
The
write*() methods may be slightly simpler for some
applications to use than direct callbacks. For example, they support
a simple policy for encoding data items as the content of a single element.
To reuse an XMLWriter you must provide it with a new Writer, since
this handler closes the writer it was given as part of its endDocument()
handling. (XML documents have an end of input, and the way to encode
that on a stream is to close it.)
Note that any relative URIs in the source document, as found in
entity and notation declarations, ought to have been fully resolved by
the parser providing events to this handler. This means that the
output text should only have fully resolved URIs, which may not be
the desired behavior in cases where later binding is desired.
Note that due to SAX2 defaults, you may need to manually
ensure that the input events are XML-conformant with respect to namespace
prefixes and declarations. gnu.xml.pipeline.NSFilter is
one solution to this problem, in the context of processing pipelines.
Something as simple as connecting this handler to a parser might not
generate the correct output. Another workaround is to ensure that the
namespace-prefixes feature is always set to true, if you're
hooking this directly up to some XMLReader implementation.
- David Brownell
gnu.xml.pipeline.TextConsumer
void | attributeDecl(java.lang.String eName, java.lang.String aName, java.lang.String type, java.lang.String mode, java.lang.String value) |
void | characters(char ch[] , int start, int length) |
void | comment(char ch[] , int start, int length) |
void | elementDecl(java.lang.String name, java.lang.String model) |
void | endCDATA() |
void | endDocument() |
void | endDTD() |
void | endElement(java.lang.String uri, java.lang.String localName, java.lang.String qName) |
void | endEntity(java.lang.String name) |
void | endPrefixMapping(java.lang.String prefix) |
void | externalEntityDecl(java.lang.String name, java.lang.String publicId, java.lang.String systemId) |
void | fatal(java.lang.String message, java.lang.Exception e) |
void | flush() |
void | ignorableWhitespace(char ch[] , int start, int length) |
void | internalEntityDecl(java.lang.String name, java.lang.String value) |
boolean | isCanonical() |
boolean | isExpandingEntities() |
boolean | isPrettyPrinting() |
boolean | isXhtml() |
void | notationDecl(java.lang.String name, java.lang.String publicId, java.lang.String systemId) |
void | processingInstruction(java.lang.String target, java.lang.String data) |
void | setCanonical(boolean value) |
void | setDocumentLocator(Locator l) |
void | setEOL(java.lang.String eolString) |
void | setErrorHandler(ErrorHandler handler) |
void | setExpandingEntities(boolean value) |
void | setPrettyPrinting(boolean value) |
void | setWriter(java.io.Writer writer, java.lang.String encoding) |
void | setXhtml(boolean value) |
void | skippedEntity(java.lang.String name) |
void | startCDATA() |
void | startDocument() |
void | startDTD(java.lang.String name, java.lang.String publicId, java.lang.String systemId) |
void | startElement(java.lang.String uri, java.lang.String localName, java.lang.String qName, Attributes atts) |
void | startEntity(java.lang.String name) |
void | startPrefixMapping(java.lang.String prefix, java.lang.String uri) |
void | unparsedEntityDecl(java.lang.String name, java.lang.String publicId, java.lang.String systemId, java.lang.String notationName) |
void | write(java.lang.String data) |
void | writeElement(java.lang.String uri, java.lang.String localName, java.lang.String qName, Attributes atts, java.lang.String content) |
void | writeElement(java.lang.String uri, java.lang.String localName, java.lang.String qName, Attributes atts, int content) |
void | writeEmptyElement(java.lang.String uri, java.lang.String localName, java.lang.String qName, Attributes atts) |
XMLWriter
public XMLWriter()
Constructs this handler with System.out used to write SAX events
using the UTF-8 encoding. Avoid using this except when you know
it's safe to close System.out at the end of the document.
XMLWriter
public XMLWriter(java.io.OutputStream out)
Constructs a handler which writes all input to the output stream
in the UTF-8 encoding, and closes it when endDocument is called.
(Yes it's annoying that this throws an exception -- but there's
really no way around it, since it's barely possible a JDK may
exist somewhere that doesn't know how to emit UTF-8.)
- out
XMLWriter
public XMLWriter(java.io.Writer writer, java.lang.String encoding)
Constructs a handler which writes all input to the writer, and then
closes the writer when the document ends. If an XML declaration is
written onto the output, this class will use the specified encoding
name in that declaration. If no encoding name is specified, no
encoding name will be declared unless this class can otherwise
determine the name of the character encoding for this writer.
At this time, only the UTF-8 ("UTF8") and UTF-16 ("Unicode")
output encodings are fully lossless with respect to XML data. If you
use any other encoding you risk having your data be silently mangled
on output, as the standard Java character encoding subsystem silently
maps non-encodable characters to a question mark ("?") and will not
report such errors to applications.
For a few other encodings the risk can be reduced. If the writer is
a java.io.OutputStreamWriter, and uses either the ISO-8859-1 ("8859_1",
"ISO8859_1", etc) or US-ASCII ("ASCII") encodings, content which
can't be encoded in those encodings will be written safely. Where
relevant, the XHTML entity names will be used; otherwise, numeric
character references will be emitted.
However, there remain a number of cases where substituting such
entity or character references is not an option. Such references are
not usable within a DTD, comment, PI, or CDATA section. Neither may
they be used when element, attribute, entity, or notation names have
the problematic characters.
- writer - XML text is written to this writer.
- encoding - if non-null, and an XML declaration is written,
this is the name that will be used for the character encoding.
XMLWriter
public XMLWriter(java.io.Writer writer)
Constructs a handler which writes all input to the writer, and then
closes the writer when the document ends. If an XML declaration is
written onto the output, and this class can determine the name of
the character encoding for this writer, that encoding name will be
included in the XML declaration.
See the description of the constructor which takes an encoding
name for imporant information about selection of encodings.
- writer - XML text is written to this writer.
attributeDecl
public final void attributeDecl(java.lang.String eName, java.lang.String aName, java.lang.String type, java.lang.String mode, java.lang.String value)
SAX2: called on attribute declarations
- eName
- aName
- type
- mode
- value
characters
public final void characters(char ch[] , int start, int length)
SAX1: reports content characters
- start
- length
comment
public final void comment(char ch[] , int start, int length)
SAX2: called when comments are parsed.
When XHTML is used, the old HTML tradition of using comments
to for inline CSS, or for JavaScript code is discouraged.
This is because XML processors are encouraged to discard, on
the grounds that comments are for users (and perhaps text
editors) not programs. Instead, use external scripts
- start
- length
elementDecl
public final void elementDecl(java.lang.String name, java.lang.String model)
SAX2: called on element declarations
- name
- model
endCDATA
public final void endCDATA()
SAX2: called after parsing CDATA characters
endDocument
public void endDocument()
SAX1: indicates the completion of a parse.
Note that all complete SAX event streams make this call, even
if an error is reported during a parse.
endDTD
public final void endDTD()
SAX2: called after the doctype is parsed
endElement
public final void endElement(java.lang.String uri, java.lang.String localName, java.lang.String qName)
SAX2: indicates the end of an element
- uri
- localName
- qName
endEntity
public final void endEntity(java.lang.String name)
SAX2: called after parsing a general entity in content
- name
endPrefixMapping
public final void endPrefixMapping(java.lang.String prefix)
SAX2: ignored.
- prefix
externalEntityDecl
public final void externalEntityDecl(java.lang.String name, java.lang.String publicId, java.lang.String systemId)
SAX2: called on external entity declarations
- name
- publicId
- systemId
fatal
protected void fatal(java.lang.String message, java.lang.Exception e)
Used internally and by subclasses, this encapsulates the logic
involved in reporting fatal errors. It uses locator information
for good diagnostics, if available, and gives the application's
ErrorHandler the opportunity to handle the error before throwing
an exception.
- message
- e
flush
public final void flush()
Flushes the output stream. When this handler is used in long lived
pipelines, it can be important to flush buffered state, for example
so that it can reach the disk as part of a state checkpoint.
ignorableWhitespace
public final void ignorableWhitespace(char ch[] , int start, int length)
SAX1: reports ignorable whitespace
- start
- length
internalEntityDecl
public final void internalEntityDecl(java.lang.String name, java.lang.String value)
SAX2: called on internal entity declarations
- name
- value
isCanonical
public final boolean isCanonical()
Returns value of flag controlling canonical output.
isExpandingEntities
public final boolean isExpandingEntities()
Returns true if the output will have no entity references;
returns false (the default) otherwise.
isPrettyPrinting
public final boolean isPrettyPrinting()
Returns value of flag controlling pretty printing.
isXhtml
public final boolean isXhtml()
Returns true if the output attempts to echo the input following
"transitional" XHTML rules and matching the "HTML Compatibility
Guidelines" so that an HTML version 3 browser can read the output
as HTML; returns false (the default) othewise.
notationDecl
public final void notationDecl(java.lang.String name, java.lang.String publicId, java.lang.String systemId)
SAX1: called on notation declarations
- name
- publicId
- systemId
processingInstruction
public final void processingInstruction(java.lang.String target, java.lang.String data)
SAX1: reports a PI.
This doesn't check for illegal target names, such as "xml" or "XML",
or namespace-incompatible ones like "big:dog"; the caller is
responsible for ensuring those names are legal.
- target
- data
setCanonical
public final void setCanonical(boolean value)
Sets the output style to be canonicalized. Input events must
meet requirements that are slightly more stringent than the
basic well-formedness ones, and include:
- Namespace prefixes must not have been changed from those
in the original document. (This may only be ensured by setting
the SAX2 XMLReader namespace-prefixes feature flag;
by default, it is cleared.)
- Redundant namespace declaration attributes have been
removed. (If an ancestor element defines a namespace prefix
and that declaration hasn't been overriden, an element must
not redeclare it.)
- If comments are not to be included in the canonical output,
they must first be removed from the input event stream; this
Canonical XML with comments by default.
- If the input character encoding was not UCS-based, the
character data must have been normalized using Unicode
Normalization Form C. (UTF-8 and UTF-16 are UCS-based.)
- Attribute values must have been normalized, as is done
by any conformant XML processor which processes all external
parameter entities.
- Similarly, attribute value defaulting has been performed.
Note that fragments of XML documents, as specified by an XPath
node set, may be canonicalized. In such cases, elements may need
some fixup (for
xml:* attributes and application-specific
context).
- value
java.lang.IllegalArgumentException
- if the output encoding
is anything other than UTF-8.
setDocumentLocator
public final void setDocumentLocator(Locator l)
SAX1: provides parser status information
- l
setEOL
public final void setEOL(java.lang.String eolString)
Assigns the line ending style to be used on output.
- eolString - null to use the system default; else
"\n", "\r", or "\r\n".
setErrorHandler
public void setErrorHandler(ErrorHandler handler)
Assigns the error handler to be used to present most fatal
errors.
- handler
setExpandingEntities
public final void setExpandingEntities(boolean value)
Controls whether the output text contains references to
entities (the default), or instead contains the expanded
values of those entities.
- value
setPrettyPrinting
public final void setPrettyPrinting(boolean value)
Controls pretty-printing, which by default is not enabled
(and currently is most useful for XHTML output).
Pretty printing enables structural indentation, sorting of attributes
by name, line wrapping, and potentially other mechanisms for making
output more or less readable.
At this writing, structural indentation and line wrapping are
enabled when pretty printing is enabled and the
xml:space
attribute has the value
default (its other legal value is
preserve, as defined in the XML specification). The three
XHTML element types which use another value are recognized by their
names (namespaces are ignored).
Also, for the record, the "pretty" aspect of printing here
is more to provide basic structure on outputs that would otherwise
risk being a single long line of text. For now, expect the
structure to be ragged ... unless you'd like to submit a patch
to make this be more strictly formatted!
- value
java.lang.IllegalStateException
- thrown if this method is invoked
after output has begun.
setWriter
public final void setWriter(java.io.Writer writer, java.lang.String encoding)
Resets the handler to write a new text document.
- writer - XML text is written to this writer.
- encoding - if non-null, and an XML declaration is written,
this is the name that will be used for the character encoding.
java.lang.IllegalStateException
- if the current
document hasn't yet ended (with #endDocument)
setXhtml
public final void setXhtml(boolean value)
Controls whether the output should attempt to follow the "transitional"
XHTML rules so that it meets the "HTML Compatibility Guidelines"
appendix in the XHTML specification. A "transitional" Document Type
Declaration (DTD) is placed near the beginning of the output document,
instead of whatever DTD would otherwise have been placed there, and
XHTML empty elements are printed specially. When writing text in
US-ASCII or ISO-8859-1 encodings, the predefined XHTML internal
entity names are used (in preference to character references) when
writing content characters which can't be expressed in those encodings.
When this option is enabled, it is the caller's responsibility
to ensure that the input is otherwise valid as XHTML. Things to
be careful of in all cases, as described in the appendix referenced
above, include:
- Element and attribute names must be in lower case, both
in the document and in any CSS style sheet.
- All XML constructs must be valid as defined by the XHTML
"transitional" DTD (including all familiar constructs,
even deprecated ones).
- The root element must be "html".
- Elements that must be empty (such as <br>
must have no content.
- Use both lang and xml:lang attributes
when specifying language.
- Similarly, use both id and name attributes
when defining elements that may be referred to through
URI fragment identifiers ... and make sure that the
value is a legal NMTOKEN, since not all such HTML 4.0
identifiers are valid in XML.
- Be careful with character encodings; make sure you provide
a <meta http-equiv="Content-type"
content="text/xml;charset=..." /> element in
the HTML "head" element, naming the same encoding
used to create this handler. Also, if that encoding
is anything other than US-ASCII, make sure that if
the document is given a MIME content type, it has
a charset=... attribute with that encoding.
Additionally, some of the oldest browsers have additional
quirks, to address with guidelines such as:
- Processing instructions may be rendered, so avoid them.
(Similarly for an XML declaration.)
- Embedded style sheets and scripts should not contain XML
markup delimiters: &, <, and ]]> are trouble.
- Attribute values should not have line breaks or multiple
consecutive white space characters.
- Use no more than one of the deprecated (transitional)
<isindex> elements.
- Some boolean attributes (such as compact, checked,
disabled, readonly, selected, and more) confuse
some browsers, since they only understand minimized
versions which are illegal in XML.
Also, some characteristics of the resulting output may be
a function of whether the document is later given a MIME
content type of
text/html rather than one indicating
XML (
application/xml or
text/xml). Worse,
some browsers ignore MIME content types and prefer to rely URI
name suffixes -- so an "index.xml" could always be XML, never
XHTML, no matter its MIME type.
- value
skippedEntity
public void skippedEntity(java.lang.String name)
SAX1: indicates a non-expanded entity reference
- name
startCDATA
public final void startCDATA()
SAX2: called before parsing CDATA characters
startDocument
public void startDocument()
SAX1: indicates the beginning of a document parse.
If you're writing (well formed) fragments of XML, neither
this nor endDocument should be called.
startDTD
public final void startDTD(java.lang.String name, java.lang.String publicId, java.lang.String systemId)
SAX2: called when the doctype is partially parsed
Note that this, like other doctype related calls, is ignored
when XHTML is in use.
- name
- publicId
- systemId
startElement
public final void startElement(java.lang.String uri, java.lang.String localName, java.lang.String qName, Attributes atts)
SAX2: indicates the start of an element.
When XHTML is in use, avoid attribute values with
line breaks or multiple whitespace characters, since
not all user agents handle them correctly.
- uri
- localName
- qName
- atts
startEntity
public final void startEntity(java.lang.String name)
SAX2: called before parsing a general entity in content
- name
startPrefixMapping
public final void startPrefixMapping(java.lang.String prefix, java.lang.String uri)
SAX2: ignored.
- prefix
- uri
unparsedEntityDecl
public final void unparsedEntityDecl(java.lang.String name, java.lang.String publicId, java.lang.String systemId, java.lang.String notationName)
SAX1: called on unparsed entity declarations
- name
- publicId
- systemId
- notationName
write
public final void write(java.lang.String data)
Writes the string as if characters() had been called on the contents
of the string. This is particularly useful when applications act as
producers and write data directly to event consumers.
- data
writeElement
public void writeElement(java.lang.String uri, java.lang.String localName, java.lang.String qName, Attributes atts, int content)
Writes an element that has content consisting of a single integer,
encoded as a decimal string.
- uri
- localName
- qName
- atts
- content
writeEmptyElement
startElement
writeElement
public void writeElement(java.lang.String uri, java.lang.String localName, java.lang.String qName, Attributes atts, java.lang.String content)
Writes an element that has content consisting of a single string.
- uri
- localName
- qName
- atts
- content
writeEmptyElement
startElement
writeEmptyElement
public void writeEmptyElement(java.lang.String uri, java.lang.String localName, java.lang.String qName, Attributes atts)
Writes an empty element.
- uri
- localName
- qName
- atts
startElement
Some of the methods on this class are intended for applications to use directly, rather than as pure SAX2 event callbacks. Some of those methods access the JavaBeans properties (used to tweak output formats, for example canonicalization and pretty printing). Subclasses are expected to add new behaviors, not to modify current behavior, so many such methods are final. The write*() methods may be slightly simpler for some applications to use than direct callbacks. For example, they support a simple policy for encoding data items as the content of a single element. To reuse an XMLWriter you must provide it with a new Writer, since this handler closes the writer it was given as part of its endDocument() handling. (XML documents have an end of input, and the way to encode that on a stream is to close it.)
Note that any relative URIs in the source document, as found in entity and notation declarations, ought to have been fully resolved by the parser providing events to this handler. This means that the output text should only have fully resolved URIs, which may not be the desired behavior in cases where later binding is desired. Note that due to SAX2 defaults, you may need to manually ensure that the input events are XML-conformant with respect to namespace prefixes and declarations. gnu.xml.pipeline.NSFilter is one solution to this problem, in the context of processing pipelines. Something as simple as connecting this handler to a parser might not generate the correct output. Another workaround is to ensure that the namespace-prefixes feature is always set to true, if you're hooking this directly up to some XMLReader implementation.