[FIXME: this is taken from Gary and Mark's quick summaries and should be reviewed and expanded. Rx is pretty stable, so could already be done!]
Guile includes an interface to Tom Lord's Rx library (currently only to POSIX regular expressions). Use of the library requires a two step process: compile a regular expression into an efficient structure, then use the structure in any number of string comparisons.
For example, given the regular expression `abc.' (which matches any string containing `abc' followed by any single character):
guile> (define r (regcomp "abc.")) guile> r #<rgx abc.> guile> (regexec r "abc") #f guile> (regexec r "abcd") #((0 . 4)) guile>
The definitions of regcomp
and regexec
are as follows:
.
or [^...]
from matching newlines.
The logior
procedure can be used to combine multiple flags.
The default is to use
POSIX basic syntax, which makes +
and ?
literals and \+
and \?
operators. Backslashes in pattern must be escaped if specified in a
literal string e.g., "\\(a\\)\\?"
.
Match string against the compiled POSIX regular expression regex. match-pick and flags are optional. Possible flags (which can be combined using the logior procedure) are:
If no match is possible, regexec returns #f. Otherwise match-pick determines the return value:
#t
or unspecified: a newly-allocated vector is returned,
containing pairs with the indices of the matched part of string and any
substrings.
""
: a list is returned: the first element contains a nested list
with the matched part of string surrounded by the the unmatched parts.
Remaining elements are matched substrings (if any). All returned
substrings share memory with string.
#f
: regexec returns #t if a match is made, otherwise #f.
vector: the supplied vector is returned, with the first element replaced by a pair containing the indices of the matched portion of string and further elements replaced by pairs containing the indices of matched substrings (if any).
list: a list will be returned, with each member of the list specified by a code in the corresponding position of the supplied list:
a number: the numbered matching substring (0 for the entire match).
#\<
: the beginning of string to the beginning of the part matched
by regex.
#\>
: the end of the matched part of string to the end of
string.
#\c
: the "final tag", which seems to be associated with the "cut
operator", which doesn't seem to be available through the posix
interface.
e.g., (list #\< 0 1 #\>)
. The returned substrings share memory with
string.
Here are some other procedures that might be used when using regular expressions:
Go to the first, previous, next, last section, table of contents.