Regular-expression constructs

The regular expressions used in searches and segmentation rules are those supported by Java. If you need more specific information, please consult http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html.

You can find simple tutorials on the web (http://www.regular-expressions.info/quickstart.html, for example.)

The following construct:

Matches the following sequence:


Flags

(?i) Enables case-insensitive matching (by default, the pattern is case-sensitive).

Characters

x The character x, except the following...
\uhhhh The character with hexadecimal value 0xhhhh
\t The tab character ('\u0009')
\n The newline (line feed) character ('\u000A')
\r The carriage-return character ('\u000D')

Quotation

\ Nothing, but quotes the following character. This is required if you would like to enter of the meta characters !$()*+.<>?[\]^{|} to match as themselves.
\\ For example, this is the backslash character
\Q Nothing, but quotes all characters until \E
\E Nothing, but ends quoting started by \Q

Character classes

[abc] a, b, or c (simple class)
[^abc] Any character except a, b, or c (negation)
[a-zA-Z] a through z or A through Z, inclusive (range)

Predefined character classes

. Any character (except for line terminators)
\d A digit: [0-9]
\D A non-digit: [^0-9]
\s A whitespace character: [ \t\n\x0B\f\r]
\S A non-whitespace character: [^\s]
\w A word character: [a-zA-Z_0-9]
\W A non-word character: [^\w]

Boundary matchers

^ The beginning of a line
$ The end of a line
\b A word boundary
\B A non-word boundary

Greedy quantifiers

These will match as much as they can. For example, a+ will match aaa in aaabbb
X? X, once or not at all
X* X, zero or more times
X+ X, one or more times

Reluctant (non-greedy) quantifiers

These will match as little as they can. For example, a+? will match the first a in aaabbb
X?? X, once or not at all
X*? X, zero or more times
X+? X, one or more times

Logical operators

XY X followed by Y
X|Y Either X or Y
(XY) XY as a single group

Legal notices