net.sf.saxon.java
Class JDK14RegexTranslator
public class JDK14RegexTranslator
This class translates XML Schema regex syntax into JDK 1.4 regex syntax.
Author: James Clark, Thai Open Source Software Center Ltd. See statement at end of file.
Modified by Michael Kay (a) to integrate the code into Saxon, and (b) to support XPath additions
to the XML Schema regex syntax.
This version of the regular expression translator treats each half of a surrogate pair as a separate
character, translating anything in an XPath regex that can match a non-BMP character into a Java
regex that matches the two halves of a surrogate pair independently. This approach doesn't work
under JDK 1.5, whose regex engine treats a surrogate pair as a single character.
ALL , NONE , NOT_ALLOWED_CLASS , SOME , SURROGATES1_CLASS , SURROGATES2_CLASS , captures , caseBlind , curChar , currentCapture , eos , ignoreWhitespace , inCharClassExpr , isXPath , length , pos , regExp , result , xmlVersion |
static void | main(String[] args) - Diagnostic entry point
|
void | setIgnoreWhitespace(boolean ignore) - Indicate whether whitespace should be ignored
|
String | translate(CharSequence regExp, int xmlVersion, boolean xpath) - Translates a regular expression in the syntax of XML Schemas Part 2 into a regular
expression in the syntax of
java.util.regex.Pattern .
|
protected boolean | translateAtom()
|
absorbSurrogatePair , advance , copyCurChar , expect , highSurrogateRanges , isAsciiAlnum , isBlock , isJavaMetaChar , lowSurrogateRanges , makeException , makeException , parseQuantExact , recede , sortRangeList , translateAtom , translateBranch , translateQuantifier , translateQuantity , translateRegExp , translateTop |
JDK14RegexTranslator
public JDK14RegexTranslator()
Create a regex translator for JDK 1.4
main
public static void main(String[] args)
throws RegexSyntaxException
Diagnostic entry point
args
- argument 1 - XPath regex; argument 2 - xpath|xmlschema
setIgnoreWhitespace
public void setIgnoreWhitespace(boolean ignore)
Indicate whether whitespace should be ignored
ignore
- true if whitespace should be ignored
translate
public String translate(CharSequence regExp,
int xmlVersion,
boolean xpath)
throws RegexSyntaxException
Translates a regular expression in the syntax of XML Schemas Part 2 into a regular
expression in the syntax of java.util.regex.Pattern
. The translation
assumes that the string to be matched against the regex uses surrogate pairs correctly.
If the string comes from XML content, a conforming XML parser will automatically
check this; if the string comes from elsewhere, it may be necessary to check
surrogate usage before matching.
regExp
- a String containing a regular expression in the syntax of XML Schemas Part 2xmlVersion
- integer constant indicating XML 1.0 or XML 1.1xpath
- a boolean indicating whether the XPath 2.0 F+O extensions to the schema
regex syntax are permitted
- a String containing a regular expression in the syntax of java.util.regex.Pattern
RegexSyntaxException
- if regexp
is not a regular expression in the
syntax of XML Schemas Part 2, or XPath 2.0, as appropriate