alice
library
manual.

Alice Project

The Regex structure


________ Synopsis ____________________________________________________

    signature REGEX
    structure Regex : REGEX
  

This structure provides an interface to a (subset of) POSIX-compatible regular expressions.
Note: however, that the functions resulting from this partial application cannot be pickled.


________ Import ______________________________________________________

    import structure Regex from "x-alice:/lib/regex/Regex"
    import signature REGEX from "x-alice:/lib/regex/REGEX-sig"

________ Interface ___________________________________________________

    signature REGEX =
    sig
	type match

	infix 2 =~

	exception Malformed
	exception NoSuchGroup

	val match      : string -> string -> match option
	val =~         : string * string -> bool

	val groups     : match -> string vector
	val group      : match * int -> string
	val groupStart : match * int -> int
	val groupEnd   : match * int -> int
	val groupSpan  : match * int -> (int * int)

    end

________ Description _________________________________________________

type match

The abstract type of a matching.

exception Malformed

indicates that a regular expression not well-formed.

exception NoSuchGroup

indicates that an access to a group of a match has failed. It does not exists such a group.

match r s

returns SOME m if r matches s and NONE otherwise. It raises Malformed if r is not a well-formed regular expression.

r =~ s

The following equivalence holds:

r =~ s = Option.isSome (match r s)
groups m

returns a string vector of the given matching m

group (m, i)
groupStart (m, i)
groupEnd (m, i)

need a match m and an index i. It raises NoSuchGroup, if i >= Vector.length (groups m) or i < 0.


________ Example _____________________________________________________

This structure provides pattern matching with POSIX 1003.2 regular expressions.

The form and meaning of Extended and Basic regular expressions are described below. Here R and S denote regular expressions; m and n denote natural numbers; L denotes a character list; and d denotes a decimal digit:

ExtendedMeaning
c
Match the character c
.
Match any character
R*
Match R zero or more times
R+
Match R one or more times
R|S
Match R or S
R?
Match R or the empty string
R{m}
Match R exactly m times
R{m,}
Match R at least m times
R{m,n}
Match R at least m and at most n times
[L]
Match any character in L
[^L]
Match any character not in L
^
Match at string's beginning
$
Match at string's end
(R)
Match R as a group; save the match
\d
Match the same as previous group d
\\
Match \ --- similarly for *.[]^$
\+
Match + --- similarly for |?{}()

Some example character lists L:

[aeiou]
Match vowel: a or e or i or o or u
[0-9]
Match digit: 0 or 1 or 2 or ... or 9
[^0-9]
Match non-digit
[-+*/^]
Match - or + or * or / or ^
[-a-z]
Match lowercase letter or hyphen (-)
[0-9a-fA-F]
Match hexadecimal digit
[[:alnum:]]
Match letter or digit
[[:alpha:]]
Match letter
[[:cntrl:]]
Match ASCII control character
[[:digit:]]
Match decimal digit; same as [0-9]
[[:graph:]]
Same as [:print:] but not [:space:]
[[:lower:]]
Match lowercase letter
[[:print:]]
Match printable character
[[:punct:]]
Match punctuation character
[[:space:]]
Match SML #" ", #"\r", #"\n", #"\t", #"\v", #"\f"
[[:upper:]]
Match uppercase letter
[[:xdigit:]]
Match hexadecimal digit; same as [0-9a-fA-F]

Remember that backslash (\) must be escaped as "\\" in SML strings.

Example: Match SML integer constant:
match "^~?[0-9]+$" [Extended]

Example: Match SML alphanumeric identifier:
match "^[a-zA-Z0-9][a-zA-Z0-9'_]*$" [Extended]

Example: Match SML floating-point constant:
match "^[+~]?[0-9]+(\\.[0-9]+|(\\.[0-9]+)?[eE][+~]?[0-9]+)$" [Extended]

Example: Match any HTML start tag; make the tag's name into a group:
match "<([[:alnum:]]+)[^>]*>" [Extended]



last modified 2005/Aug/03 09:17