The regular expressions used in searches and segmentation rules are those supported by Java. If you need more specific information, please consult http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html.
You can find simple tutorials on the web (http://www.regular-expressions.info/quickstart.html, for example.)
The following construct: |
Matches the following sequence: |
Flags |
|
(?i) |
Enables case-insensitive matching (by default, the pattern is case-sensitive). |
Characters |
|
x | The character x, except the following... |
\u hhhh |
The character with hexadecimal value 0x hhhh |
\t |
The tab character ('\u0009' ) |
\n |
The newline (line feed) character ('\u000A' ) |
\r |
The carriage-return character ('\u000D' ) |
Quotation |
|
\ |
Nothing, but quotes the following character. This is required if you would like to enter of the meta characters !$()*+.<>?[\]^{|} to match as themselves. |
\\ |
For example, this is the backslash character |
\Q |
Nothing, but quotes all characters until \E |
\E |
Nothing, but ends quoting started by \Q |
Character classes |
|
[abc] |
a , b , or c (simple class) |
[^abc] |
Any character except a , b , or c (negation) |
[a-zA-Z] |
a through z or A through Z , inclusive (range) |
Predefined character classes |
|
. |
Any character (except for line terminators) |
\d |
A digit: [0-9] |
\D |
A non-digit: [^0-9] |
\s |
A whitespace character: [ \t\n\x0B\f\r] |
\S |
A non-whitespace character: [^\s] |
\w |
A word character: [a-zA-Z_0-9] |
\W |
A non-word character: [^\w] |
Boundary matchers |
|
^ |
The beginning of a line |
$ |
The end of a line |
\b |
A word boundary |
\B |
A non-word boundary |
Greedy quantifiers |
|
These will match as much as they can. For example, a+ will match aaa in aaabbb |
|
X? |
X, once or not at all |
X* |
X, zero or more times |
X+ |
X, one or more times |
Reluctant (non-greedy) quantifiers |
|
These will match as little as they can. For example, a+? will match the first a in aaabbb |
|
X?? |
X, once or not at all |
X*? |
X, zero or more times |
X+? |
X, one or more times |
Logical operators |
|
XY | X followed by Y |
X| Y |
Either X or Y |
( XY) |
XY as a single group |