The Regular-expressions library exports the
Regular-expressions
module, which contains various functions that deal with regular
expressions (abbreviated to "regexps"). The module is
based on Perl (version 4), and has the same semantics unless
otherwise noted. The syntax for Perl-style regular expressions
can be found in
Perl 5 Desktop Reference
.
There are some differences in the way
String-extensions
handles
regular expressions. The biggest difference is that regular
expressions in Dylan are case insensitive by default. Also,
when given an unparsable regexp,
String-extensions
will produce
undefined behavior while Perl would give an error message.
A regular expression that is grammatically correct may still be
illegal if it contains an infinitely quantified sub-regexp that
may match the empty string. That is, if R is a regexp that can
match the empty string, then any regexp containing R*, R+, and
R{n,} is illegal. In this case, the Regular-expressions library
will signal an <illegal-regexp>
error when the
regexp is parsed. Note: Perl also has this restriction, although
it isn't mentioned in
Perl 5 Desktop Reference
.
In previous versions of the
regular-expressions
library, each basic
function had a companion function that would pre-compute some
information needed to use the regular expression. By using the
companion function, one could avoid recomputing the same
information. In the present version, the regular-expressions
library caches this information, so the companion functions are
no longer necessary and should be considered obsolete. However,
they have been kept for backwards compatibility.
Companion functions differ in details, but they all essentially return curried versions of their corresponding basic function. For example, the following two pieces of code yield the same result:
regexp-position("This is a string", "is");
or
let is-finder = make-regexp-positioner("is"); is-finder("This is a string");
Both pieces of code should have roughly the same performance, even if the code is inside a loop. The first is the preferred method of using regexps.