rgx - Regular expressions

Module Description

The rgx module implements regular expressions. It supports words for compiling and matching regular expressions. The module uses the nfe module for the actual expression building and matching.


This module uses the following syntax:
.   Match any char [incl. newline]     *   Match zero or more
+   Match one or more                  ?   Match zero or one
|   Match alternatives
Group or subexpression

Backslash characters:
\.  Character .                       \*   Character *
\+  Character +                       \?   Character ?
\|  Character |                       \\   Backslash

\r  Carriage return                   \n   Line feed
\t  Horizontal tab                    \e   Escape

\d  Digits class: [0-9]               \D   No digits: [^0-9]
\w  Word class: [0-9a-zA-Z_]          \W   No word: [^0-9a-zA-Z_]
\s  Whitespace                        \S   No whitespace

All other backslash characters simply return the trailing character,
but this can change in future versions.

Module Words

Regular expression structure

rgx% ( -- n )
Get the required space for a rgx variable

Regular expression creation, initialisation and destruction

rgx-init ( rgx -- )
Initialise the regular expression

rgx-create ( "<spaces>name" -- ; -- rgx )
Create a named regular expression in the dictionary

rgx-new ( -- rgx )
Create a new regular expression on the heap

rgx-free ( rgx -- )
Free the regular expression from the heap

Regular expression words

rgx-compile ( c-addr u rgx -- true | n false )
Compile a pattern as regular expression, return success and optional the error offset n

rgx-cmatch? ( c-addr u rgx -- flag )
Match case-sensitive a string with the regular expression, return match result

rgx-imatch? ( c-addr u rgx -- flag )
Match case-insensitive a string with the regular expression, return match result

rgx-csearch ( c-addr u rgx -- n )
Search case-sensitive in a string for the first match of the regular expression, return offset in string, or -1 for not found

rgx-isearch ( c-addr u rgx -- n:index )
Search case-insensitive in a string for the first match of the regular expression, return offset in string, or -1 if not found

rgx-result ( n rgx -- n1 n2 )
Get the match result of the nth grouping, return match start n2 and end n1

Inspection

rgx-dump ( rgx -- )
Dump the regular expression

Examples

include ffl/rgx.fs

\ Create a regular expression variable rgx1 

rgx-create rgx1

\ Compile a regular expression and check the result

s" ((a*)b)*" rgx1 rgx-compile [IF] 
  .( Expression succesfull compiled) cr
[ELSE]
  .( Compilation failed on position:) . cr
[THEN]

\ Match case sensitive a test string
 
s" abb" rgx1 rgx-cmatch? [IF]
  .( Test string matched) cr
[ELSE]
  .( No match) cr  
[THEN]



\ Create a regular expression variable on the heap
 
rgx-new value rgx2

\ Compile a regular expression for matching a float number

s" (\+|-|\s)?\d+(\.\d+)?" rgx2 rgx-compile [IF]
  .( Expression succesfull compiled) cr
[ELSE]
  .( Compilation failed on position:) . cr
[THEN]

\ Match a float number

s" -12.47" rgx2 rgx-cmatch? [IF]
  .( Float number matched) cr
[ELSE]
  .( No match) cr
[THEN]

\ Free the variable from the heap

rgx2 rgx-free


generated 10-Apr-2008 by ofcfrth-0.5.0