net.sf.saxon.java
Class JDK15RegexTranslator
public class JDK15RegexTranslator
This class translates XML Schema regex syntax into JDK 1.5 regex syntax. This differs from the JDK 1.4
translator because JDK 1.5 handles non-BMP characters (wide characters) in places where JDK 1.4 does not,
for example in a range such as [X-Y]. This enables much of the code from the 1.4 translator to be
removed.
Author: James Clark, Thai Open Source Software Center Ltd. See statement at end of file.
Modified by Michael Kay (a) to integrate the code into Saxon, and (b) to support XPath additions
to the XML Schema regex syntax. This version also removes most of the complexities of handling non-BMP
characters, since JDK 1.5 handles these natively.
static net.sf.saxon.java.JDK15RegexTranslator.CharClass[] | categoryCharClasses - Translates XML Schema and XPath regexes into
java.util.regex regexes.
|
static net.sf.saxon.java.JDK15RegexTranslator.CharClass[] | specialBlockCharClasses - CharClass for each block name in specialBlockNames.
|
static net.sf.saxon.java.JDK15RegexTranslator.CharClass[] | subCategoryCharClasses
|
ALL , NONE , NOT_ALLOWED_CLASS , SOME , SURROGATES1_CLASS , SURROGATES2_CLASS , captures , caseBlind , curChar , currentCapture , eos , ignoreWhitespace , inCharClassExpr , isXPath , length , pos , regExp , result , xmlVersion |
static void | main(String[] args) - Main method for testing.
|
static String | translate(CharSequence regExp, int xmlVersion, boolean xpath, boolean ignoreWhitespace, boolean caseBlind) - Translates a regular expression in the syntax of XML Schemas Part 2 into a regular
expression in the syntax of
java.util.regex.Pattern .
|
protected boolean | translateAtom()
|
absorbSurrogatePair , advance , copyCurChar , expect , highSurrogateRanges , isAsciiAlnum , isBlock , isJavaMetaChar , lowSurrogateRanges , makeException , makeException , parseQuantExact , recede , sortRangeList , translateAtom , translateBranch , translateQuantifier , translateQuantity , translateRegExp , translateTop |
categoryCharClasses
public static final net.sf.saxon.java.JDK15RegexTranslator.CharClass[] categoryCharClasses
Translates XML Schema and XPath regexes into java.util.regex
regexes.
specialBlockCharClasses
public static final net.sf.saxon.java.JDK15RegexTranslator.CharClass[] specialBlockCharClasses
CharClass for each block name in specialBlockNames.
subCategoryCharClasses
public static final net.sf.saxon.java.JDK15RegexTranslator.CharClass[] subCategoryCharClasses
main
public static void main(String[] args)
throws RegexSyntaxException
Main method for testing. Outputs to System.err the Java translation of a supplied
regular expression
args
- command line arguments
arg[0] a regular expression
arg[1] = xpath to invoke the XPath rules
translate
public static String translate(CharSequence regExp,
int xmlVersion,
boolean xpath,
boolean ignoreWhitespace,
boolean caseBlind)
throws RegexSyntaxException
Translates a regular expression in the syntax of XML Schemas Part 2 into a regular
expression in the syntax of java.util.regex.Pattern
. The translation
assumes that the string to be matched against the regex uses surrogate pairs correctly.
If the string comes from XML content, a conforming XML parser will automatically
check this; if the string comes from elsewhere, it may be necessary to check
surrogate usage before matching.
regExp
- a String containing a regular expression in the syntax of XML Schemas Part 2xmlVersion
- set to Configuration.XML10
for XML 1.0
or Configuration.XML11
for XML 1.1xpath
- a boolean indicating whether the XPath 2.0 F+O extensions to the schema
regex syntax are permittedignoreWhitespace
- true if whitespace is to be ignored ('x' flag)caseBlind
- true if case is to be ignored ('i' flag)
- a JDK 1.5 regular expression
RegexSyntaxException
- if regexp
is not a regular expression in the
syntax of XML Schemas Part 2, or XPath 2.0, as appropriate