PROGRAMMING LANGUAGES

Regex

Article by:
Date Published:
Last Modified:

All the listed regex examples can be tested with the online regex engine regexpal.

Quick Reference

OperatorNameUse
\Backslash/Escape CharacterA backslash is used as the escape character. It turns off the special meaning of the following character (i.e. any character in this table). e.g. \* will match *, and \\ will match \.
.PeriodWill match a single instance of any character, except end-of-line.
^CaratMatch the start of a line (also see $).
$Dollar SignMatch the end of a line (also see ^.
|PipeMatch either the regular expression preceding it or the regular expression following it (OR operation).
[]Square BracketsMatch any of the characters inside the square brackets (a character set). e.g. [ade] will match a, d or e. For a range use - (hyphen), e.g. [0-9].
[^]Square Brackets With CaratMatch any characters except those inside the square brackets (a character set). e.g. [^ade] will match anything **EXCEPT** a, d or e. For a range use - (hyphen), e.g. [^0-9].
()ParenthesesUsed to group regular expressions together, and override the standard order of processing of particular operators. This is similar to how parentheses are used in maths.
!Exclamation MarkDo not match the next regular expression (or group, when enclosed in parentheses). Similar to ^, but this is used outside of square brackets.
?Question MarkMatch the preceding expression 0 or 1 times. This is equivalent to saying "this expression is optional".
*AsteriskMatch the preceding expression 0 or more times. This is equivalent to saying "this is a greedy expression, but is optional".
+Plus SignMatch the preceding expression 1 or more times. This is equivalent to saying "this is a greedy expression". This is equivalent to using

Preventing Recursive Find And Replace Matches

A Lookahead Example

If you had the function UartComms() and you wanted to find and replace all instances with UartCommsSend(), you would use the following syntax:

Find: UartComms(?!Send)

Replace: UartCommsSend

This example uses a regex “lookahead” rather than a “lookbehind”. Notice how you don’t have to use an angular bracket in the lookahead example, but you have to use a < in the lookbehind example.

A Lookbehind Example

Sometimes when using find and replace you can find yourself in a loop, the thing you are replacing with contains the original word, and so when “Find Next” is run, it finds the word in itself. You can use Regex (if your find and replace program supports it) to prevent this from happening. Note that this normally only happens if you iterate through each one by clicking “Replace Next”. Clicking “Replace All” normally overcomes this problem.

For example, say you had the function CommsSend(), and you wished to find and replace all instances of this in your code with UartCommsSend(). With a normal “Replace Next”, this would get you into an infinite loop.

Find: CommsSend

Replace: UartCommsSend

The trick is to use a Regex expression called a “positive/negative lookahead”. In this case we need a lookbehind, which will check the characters before CommsSend; If they are equal and only equal to Uart it will not create a match, hence fixing the recursive find-replace issue.

Find: (?<!Uart)CommsSend

Replace: UartCommsSend

WARNING

I have had issues using greedy operators (such as * and +) inside a lookahead. Upon adding such operators to character regions (e.g. [a=z]*), the lookbehind fails to match anything. The error message was “Lookbehind requires fixed-width pattern”.

Finding C/C++ Function Definitions

You can use the following syntax to find a C/C++ function definitions based purely on the function name. This does not take into account the name or number of input variables, so in a language which supports function overloading (e.g. C++), this will find all overloads of a certain function. It works by looking for the function name, matching the (, and number of characters and a matching ), then any number of white space or new lines before a {, then any number of characters, white-space or new lines before the closing }.

1
FuncName(.*)\s{(.*\n)*.*(\n)*}

Replace FuncName with the name of the function you wish to find.

File Paths

To match a directory, including the last / of a file path, use:

1
(.*=?\\)

This will match C:/test/ in C:/test/reg.exe and root/samples/ in root/samples/filename.txt.

To match the directory, excluding the last / of a file path, use:

1
.*(?=/)

This will match C:/test in C:/test/reg.exe and root/samples in root/samples/filename.txt.

To match all files, except those that begin with the tilda character (~), use:

1
.*(?=/)/[^~]*

This will match C:/dir/include.txt but NOT C:/dir/~exclude.txt.

Matching Strings

The following regex will match strings enclosed by double or single quotations. Delete either enclosed in the square brackets to exclude that style of string delimiting from the match.

1
(["'])(?:\\\1|.)*?\1

This will match test string 1 and test string 2 in this is test string 1 and this is test string 2.

The following regex expression matches all spaces, except those that are enclosed in double quotes. Ignore the first and last double quotation (it is just there to show you that a space exists at the start), and note the first character of the expression is a space. The expression can be useful in command-line processing applications.

1
" (?=[^"]*("[^"]*"[^"]*)*$)"

Matching All Printable ASCII Characters

The following matches all printable ASCII characters (which are grouped together on the ASCII table, from number 32 to 126).

1
[ -~]

Online Regex Testers

Plenty of online regex testers exist for testing the matching of regex expressions on sample text.

My favourite is RegExr by gSkinner. It has powerful features like helpful mouseover tooltips which tell you what a certain sections of your regex string are doing.

There is also:


Authors

Geoffrey Hunter

Dude making stuff.

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License .

Related Content:

Tags

comments powered by Disqus