Main topics:

Regular Expressions (REG)

OCCURRENCE
Ways to use regular expressions
Syntax of POSIX extended regular expressions as used in ARB
Special syntax for search and replace
WARNINGS
BUGS

OCCURRENCE

Many places

Ways to use regular expressions

There are two ways to use regular expressions:

[1]     /Search Regexpr/Replace String/
[2]     /Search Regexpr/

[1] searches the input for occurrences of 'Search Regexpr' and replaces every occurrence with 'Replace String'.

[2] searches the input for the FIRST occurrence of 'Search Regexpr' and returns the found match. If nothing matches, it returns an empty string.

Notes:

You can use regular expressions everywhere where you can use ACI and SRT expressions.
At some places only [2] is available (e.g. in Search&Query).
Normally regular expressions work case sensitive. To make them work case insensitive, simply append an 'i' to the expression (i.e. '/expr/i' or '/expr/repl/i')

Syntax of POSIX extended regular expressions as used in ARB

A regular expression specifies a set of character strings, e.g. the expression '/pseu/i' specifies all strings containing "pseu", "Pseu" or "pSeu" and so on. We say the expression "matches" (a part of) these strings.

Several characters have special meanings in regular expressions. All other characters just match against themselves.

Special characters:

'.'          matches any character (e.g. '/h.s/' matches "has" and "his")
'[xyz]'      matches 'x', 'y' or 'z'
'[a-z]'      matches all lower case letters
'^'          matches the beginning of the string
             (e.g. '/^pseu/i' matches all strings starting with "pseu")
'$'          matches the end of the string
             (e.g. '/cens$/i' matches all strings ending in "cens")

'*'          matches the preceding element zero or more times
             (e.g. '/th*is/' matches "tis", "this", "thhhhhhiss", ..)
'?'          matches the preceding element zero or one time
             (e.g. '/th?is/' matches "tis" or "this", but not "thhis")
'+'          matches the preceding element one or more times
             (e.g. '/th+is/' matches "this" or "thhhis", but not "tis")
'{mi,ma}'    matches the preceding element 3 to 5 times
             (e.g. '/th{2,4}is/' matches "thhis", "thhhis" or "thhhhis")

'|'          marks an alternative

Example: '/bacter|spiri/i' matches all strings containing either "bacter" or "spiri".

'()'         marks a subexpression.

Subexpressions can be used to separate alternatives or to mark parts for reference in the replace expression (see section about replacement below).

Examples:

'/bact|spiri.*cens/'
matches '/bact/' or '/spiri.*cens/'.
whereas '/(bact|spiri).*cens/'
matches '/bact.*cens/' or '/spiri.*cens/'.

To match against special characters themselves, escape them using a '\' (e.g. '/\*/' matches the character "*", '/\\/' matches "\")

Character classes:

[...]      is called a character class. It matches against any of the characters
           listed in between the brackets.
[^...]     If the character class starts with '^' it matches against any character
           NOT listed (e.g. '[^78]' matches all but '7' or '8')
[5-9]      When the character class contains a '-', it will be interpreted as
           "range of characters". Here '5-9' is equivalent to '56789'.
           You may mix ranges and single characters,
           e.g. '14-79' is same as '145679', '7-91-3' is same as '789123'.

To add special characters to a character class, escape them using '\'.

There are several special predefined character classes like

[:alpha:] = [a-zA-Z]
[:digit:] = [0-9]
[:alnum:] = [[:alpha:][:digit:]]
[:punct:] = Punctuation characters
[:print:] = Visible characters and the space character
[:blank:] = Space and tab
[:space:] = Whitespace characters (including newlines)
[:cntrl:] = Control characters

Use these inside brackets (e.g. '/[[:cntrl:]]//' will remove all control characters). See links below for details.

Special syntax for search and replace

Syntax: '/regexp/replace/'

The part of the input string matched by 'regexp' gets replaced by 'replace'.

Simple example:

Input string:    'The quick brown fox jumps over the lazy dog'
Search&replace:  '/fox|dog/cat/'
Result:          'The quick brown cat jumps over the lazy cat'

Additionally the match (or parts of it) can be referenced in the replace string:

\0        refers to the whole match
\1        refers to the first subexpression
\2        refers to the second subexpression
...
\9        refers to the ninth subexpression

Example using refs:

Input string:    'The quick brown fox jumps over the lazy dog'
Search&replace:  '/(brown|lazy)\s+(fox|dog)/\2 \1/'
Result:          'The quick fox brown jumps over the dog lazy'

WARNINGS

POSIX extended regular expressions are not greedy, i.e. an expression like '_*' does normally match an empty string (if used w/o context).

This makes some replacements difficult, e.g. if you have data containing multiple consecutive characters and you'd like to replace these. The expression "/_/_/" does not work as expected and reports an error: "regular expression '_' matched an empty string".

A workaround is the following expression:
             "/(_+)([^_]|$)/_\2/"

Other, simpler workarounds do use the BOL/EOL operators ('^'/'$'),
e.g. to remove all trailing underscores:
             "/_*$//"

Or all leading underscores:
             "/^_*//"

Regular Expressions (REG)

OCCURRENCE

Many places

Ways to use regular expressions

There are two ways to use regular expressions:

[1] searches the input for occurrences of 'Search Regexpr' and replaces every occurrence with 'Replace String'.

[2] searches the input for the FIRST occurrence of 'Search Regexpr' and returns the found match. If nothing matches, it returns an empty string.

Notes:

Syntax of POSIX extended regular expressions as used in ARB

A regular expression specifies a set of character strings, e.g. the expression '/pseu/i' specifies all strings containing "pseu", "Pseu" or "pSeu" and so on. We say the expression "matches" (a part of) these strings.

Several characters have special meanings in regular expressions. All other characters just match against themselves.

Special characters:

Character classes:

Links:

Special syntax for search and replace

Syntax: '/regexp/replace/'

Simple example:

Additionally the match (or parts of it) can be referenced in the replace string:

Example using refs:

WARNINGS

POSIX extended regular expressions are not greedy, i.e. an expression like '_*' does normally match an empty string (if used w/o context).

This makes some replacements difficult, e.g. if you have data containing multiple consecutive characters and you'd like to replace these. The expression "/_/_/" does not work as expected and reports an error: "regular expression '_' matched an empty string".

BUGS

No bugs known

Regular Expressions (REG)

OCCURRENCE

Many places

Ways to use regular expressions

There are two ways to use regular expressions:

[1] searches the input for occurrences of 'Search Regexpr' and replaces every occurrence with 'Replace String'.

[2] searches the input for the FIRST occurrence of 'Search Regexpr' and returns the found match. If nothing matches, it returns an empty string.

Notes:

Syntax of POSIX extended regular expressions as used in ARB

A regular expression specifies a set of character strings, e.g. the expression '/pseu/i' specifies all strings containing "pseu", "Pseu" or "pSeu" and so on. We say the expression "matches" (a part of) these strings.

Several characters have special meanings in regular expressions. All other characters just match against themselves.

Special characters:

Character classes:

Links:

Special syntax for search and replace

Syntax: '/regexp/replace/'

Simple example:

Additionally the match (or parts of it) can be referenced in the replace string:

Example using refs:

WARNINGS

POSIX extended regular expressions are not greedy, i.e. an expression like '_*' does normally match an empty string (if used w/o context).

This makes some replacements difficult, e.g. if you have data containing multiple consecutive characters and you'd like to replace these. The expression "/_*/_/" does not work as expected and reports an error: "regular expression '_*' matched an empty string".

BUGS

No bugs known

This makes some replacements difficult, e.g. if you have data containing multiple consecutive characters and you'd like to replace these. The expression "/_/_/" does not work as expected and reports an error: "regular expression '_' matched an empty string".