Special Characters
Special character escapes let you insert control characters and Unicode code points directly into your patterns.
They are useful when you need to match characters that are hard to type, hard to see, or ambiguous in source code (for example line breaks or non-ASCII characters).
\n
Inserts a line feed character (LF, U+000A).
Use this escape whenever your pattern needs to work with line-oriented input.
If you need Windows-style line breaks (CRLF) to behave like a single newline, enable the
Flag::CRLF flag when compiling the pattern. With this
flag enabled, the engine treats any \r\n sequence as \n during
matching.
\r
Inserts a carriage return character (CR, U+000D).
\t
Inserts a horizontal tab character (TAB, U+0009).
\uhhhh
Inserts the Unicode character with the given code point.
hhhhmust be exactly four hexadecimal digits (0–9, a–f, A–F).If the resulting value is outside the valid Unicode range, a parse error is raised.
This form is convenient for characters in the Basic Multilingual Plane (BMP).
Examples:
\u0041 inserts
A(U+0041)\u00E9 inserts
é(U+00E9)
\u{hh…}
Inserts the Unicode character with the given code point.
hh…is a variable-length hexadecimal number, terminated by}.The value must be within the valid Unicode range, otherwise a parse error is raised.
This form is convenient for code points below U+0100 and beyond U+FFFF.
Examples:
\u{7} inserts BEL (U+0007)
\u{1F600} inserts
😀(U+1F600)\u{10FFFF} inserts the highest valid Unicode code point.
Legacy and Compatibility Expressions
The following escape sequences exist for compatibility with other engines. They are supported only as legacy syntax and should not be used for new patterns.
Prefer \n, \r, \t and the Unicode escapes \uhhhh / \u{hh…} instead.
\a
Inserts the bell character (BEL, U+0007).
\cX
Inserts an ASCII control character.
X must be an ASCII letter (A–Z or a–z). The resulting character is computed as:
X & 0x1F
Example:
\cJ inserts line feed (LF, U+000A)
\e
Inserts the escape character (ESC, U+001B).
\f
Inserts the form feed character (FF, U+000C).
\N
Matches any character except a line feed (LF, U+000A).
This is a compatibility escape found in some engines. For new patterns, prefer an explicit negated class such as [^\n] when you want “not newline”.
\o{ddd…}
Inserts the character with the given octal code.
ddd…is a variable-length octal number (digits 0–7), terminated by}.The value must be within the valid Unicode range, otherwise a parse error is raised.
\xhh
Inserts the character with the given hexadecimal code.
hhmust be exactly two hexadecimal digits (0–9, a–f, A–F).The value must be within the valid Unicode range, otherwise a parse error is raised.
\x{hh…}
Inserts the character with the given hexadecimal code.
hh…is a variable-length hexadecimal number, terminated by}.The value must be within the valid Unicode range, otherwise a parse error is raised.
\Uhhhhhhhh
Inserts the Unicode character with the given code point.
hhhhhhhhmust be exactly eight hexadecimal digits (0–9, a–f, A–F).The value must be within the valid Unicode range, otherwise a parse error is raised.
For new patterns, prefer \u{hh…}.