The Regular Expression Interface
Creating New Instances
Your main entry point into the Erbsland Regular Expression library is the
RegEx class. The very first step is always to
compile a regular expression pattern into a RegEx
instance.
auto reEmail = RegEx::compile(
R"(([a-zA-Z0-9\._%\+\-]+)@([a-zA-Z0-9\.\-]+\.[a-zA-Z]{2,}))"
);
If the pattern contains an error, an
Error exception is thrown with detailed
information about the problem.
When you work with static or hard-coded expressions, it is usually best not to catch this exception and let the application fail fast. This makes it easy to locate faulty patterns during development using a debugger and avoids silently shipping broken expressions.
The instance created by the
compile function stores the compiled
program of your expression, including all data required for efficient matching.
A RegEx instance is thread-safe, which
means you can safely call all of its methods from multiple threads in parallel.
Standard Flags
The compile function accepts an
optional parameter for flags. These flags
mirror common inline modifiers from the regular expression syntax:
IgnoreCaseIgnores character case during matching. This is equivalent to placing (?i) at the beginning of a pattern.MultilineAllows ^ and $ to match at the beginning and end of individual lines. This corresponds to (?m).DotAllMakes . and \s match line-break characters as well. This equals (?s).AsciiRestricts certain character classes to the ASCII range only. This matches the behaviour of (?a).VerboseEnables whitespace, line breaks and comments inside the pattern for better readability. This corresponds to (?x).
All of these flags can be specified either directly in the pattern or via the compile-time flags parameter.
Enable Line-Break Folding
The flag CRLF can only be set at
compile time and enables line-break folding.
When this feature is active, the library treats any sequence of
U+000D followed by U+000A (CRLF) as a single logical new-line
character. As a result, a pattern that matches n will
automatically match both LF and CRLF.
Although a CRLF sequence behaves like a single line break during matching,
its original representation is preserved internally. If you capture the
sequence, the result will either contain a single line-feed character or two
characters (carriage return and line feed), depending on the original input.
Settings
You can further customize the behaviour of the regular expression parser and
engine using a Settings instance.
This allows you to:
Enable or disable specific syntax features
Tighten internal limits
Configure timeouts for matching operations
These settings are especially useful when your application accepts patterns from untrusted or external sources, such as user-provided configuration files.
Using a Regular Expression Instance
A RegEx instance provides several groups of
matching and transformation functions:
match Matches the pattern at the beginning of a given text.
fullMatch Matches the pattern against the entire text and only succeeds if the whole input matches.
findFirst Searches for the first location in the text where the pattern matches.
findAll Finds all matching locations in the text. These functions return a coroutine-based generator, allowing you to stop iteration early.
collectAll Similar to findAll, but eagerly collects all matches into a
std::vector.replaceAll Replaces all matches in a text using either a replacement pattern or a custom replacement function.
Return Values
All matching methods return one or more instances of a match type wrapped in a shared pointer:
For match, fullMatch and findFirst, a return value of nullptr
indicates that no match was found.
The findAll, collectAll and replaceAll methods never use
nullptr to represent matches. If no matches are found, they instead return
an empty result (for example, an empty generator or an empty std::vector).
This design allows you to distinguish cleanly between no match and an exceptional condition without additional state checks.
The ...View Variants
Many functions also provide a ...View variant. For example,
match has a corresponding
matchView method.
These variants operate exclusively on views of the input text and return views into the original string for all match results. This approach avoids allocations and string copies, making it the most efficient option.
However, you are responsible for ensuring that the original text remains alive for as long as the match object is in use.
Pros:
No unnecessary allocations or string copies
Ideal for nested matching scenarios (e.g. matching inside a captured group)
Cons:
You must ensure that the original text outlives all match results
auto text = std::string("...");
auto reHeader = el::re::RegEx::compile(R"((?i)<h1[^>]*>(.*?)</h1>)");
auto match = reHeader->findFirstView(text);
if (match != nullptr) {
auto title = std::string(match->contentView(1));
// ...
}
// !!! BAD EXAMPLE !!!
auto match = reHeader->findFirstView(std::string("..."));
// The matched text no longer exists.
auto title = std::string(match->contentView(1)); // undefined behavior!
If you work with temporary strings or if match results need to outlive the original text, use the non-view variants. These methods create internal copies of the matched text and are safe in such scenarios.
Replacement Patterns
Replacement patterns may contain placeholders of the form {<n>} or
{<name>}, which reference numeric or named capture groups.
To insert literal { or } characters, escape them by doubling:
use {{ or }}.
Error Handling
Errors can occur not only while compiling a pattern, but also during the matching process itself:
If the matched text contains invalid UTF-8 sequences, an
Encodingerror is thrown.If a built-in or manually configured resource or time limit is exceeded, a
TimeoutorLimiterror is thrown.
For this reason, we recommend enclosing your matching code in a
try { ... } catch (...) {} block at the earliest convenient location where
you can properly handle these errors.
Parser errors during compilation usually indicate programming mistakes and should typically not be caught. Letting the application fail fast makes such issues easier to detect during development.
auto text = std::string("...");
// Do not catch parser errors here – let the application fail fast.
auto reHeader = el::re::RegEx::compile(R"((?i)<h1[^>]*>(.*?)</h1>)");
std::string title;
try {
auto match = reHeader->findFirstView(text);
if (match != nullptr) {
title = std::string(match->contentView(1));
}
} catch (const el::re::Error &error) {
title = std::format("<error: {}>", error);
}
// Process the matching result.
UTF-16 and UTF-32 Strings
Most matching functions in the RegEx API
provide overloads for UTF-16 and UTF-32 encoded strings.
These overloads behave exactly like their UTF-8 counterparts in terms of matching semantics, flags, and error handling. The only difference is the type of match object they return, which corresponds to the underlying string type.
Depending on the input, the methods return:
UTF-16 strings →
Match16orMatch16ViewUTF-32 strings →
Match32orMatch32View
This ensures that positional information and content access are always expressed in units appropriate for the input string, while keeping the overall API consistent across all supported encodings.
Interfaces and Types
-
class RegEx : public std::enable_shared_from_this<RegEx>
A regular expression.
Public Functions
-
MatchViewPtr matchView(StringView view) const
Try to match this regular expression at the start of the given string view.
Warning
If you like to access the content of the capture groups, the underlying string must be valid for the duration of the match and while reading the contents of the capture groups. Use
matchif you like to keep copies of the result in the match.- Parameters:
view – The string view to match against. Must be valid for the duration of the match.
- Returns:
A match result or nullptr if there was no match.
-
Match16ViewPtr matchView(std::u16string_view view) const
Try to match this regular expression at the start of the given UTF-16 string view.
-
Match32ViewPtr matchView(std::u32string_view view) const
Try to match this regular expression at the start of the given UTF-32 string view.
-
MatchPtr match(const String &str) const
Try to match this regular expression at the start of the given string.
Note
This method creates a copy of the captured matching string. Use
matchViewfor a more memory-efficient variant that only stores references to the original string.- Parameters:
str – The string to match against.
- Returns:
A match result or nullptr if there was no match.
-
Match16Ptr match(const std::u16string &str) const
Try to match this regular expression at the start of the given UTF-16 string.
-
Match32Ptr match(const std::u32string &str) const
Try to match this regular expression at the start of the given UTF-32 string.
-
MatchPtr match(const InputPtr &input) const
Try to match this regular expression at the start of the given input.
- Parameters:
input – Your custom input object.
- Returns:
A match result or nullptr if there was no match.
-
Match16Ptr match(const Input16Ptr &input) const
Try to match this regular expression at the start of the given UTF-16 input.
-
Match32Ptr match(const Input32Ptr &input) const
Try to match this regular expression at the start of the given UTF-32 input.
-
MatchViewPtr fullMatchView(const StringView &str) const
Try to match this regular expression for the full given string view.
Warning
If you like to access the content of the capture groups, the underlying string must be valid for the duration of the match and while reading the contents of the capture groups. Use
fullMatchif you like to keep copies of the result in the match.- Parameters:
str – The string view to match against. Must be valid for the duration of the match.
- Returns:
A match result or nullptr if there was no match.
-
Match16ViewPtr fullMatchView(const std::u16string_view &str) const
Try to match this regular expression for the full given UTF-16 string view.
-
Match32ViewPtr fullMatchView(const std::u32string_view &str) const
Try to match this regular expression for the full given UTF-32 string view.
-
MatchPtr fullMatch(const String &str) const
Try to match this regular expression for the full given string.
Note
This method creates a copy of the captured matching string. Use
matchViewfor a more memory-efficient variant that only stores references to the original string.- Parameters:
str – The string to match against.
- Returns:
A match result or nullptr if there was no match.
-
Match16Ptr fullMatch(const std::u16string &str) const
Try to match this regular expression for the full given UTF-16 string.
-
Match32Ptr fullMatch(const std::u32string &str) const
Try to match this regular expression for the full given UTF-32 string.
-
MatchPtr fullMatch(const InputPtr &input) const
Try to match this regular expression for the full input.
The regular expression must match the full input from start to end to be successful.
- Parameters:
input – Your custom input object.
- Returns:
A match result or nullptr if there was no match.
-
Match16Ptr fullMatch(const Input16Ptr &input) const
Try to match this regular expression for the full given UTF-16 input.
-
Match32Ptr fullMatch(const Input32Ptr &input) const
Try to match this regular expression for the full given UTF-32 input.
-
MatchViewPtr findFirstView(StringView view) const
Find the first match of this regular expression in the given string view.
Warning
If you like to access the content of the capture groups, the underlying string must be valid for the duration of the match and while reading the contents of the capture groups. Use
findFirstif you like to keep copies of the result in the match.- Parameters:
view – The string view to search. Must be valid for the duration of the match.
- Returns:
A match result or nullptr if there was no match.
-
Match16ViewPtr findFirstView(std::u16string_view view) const
Find the first match of this regular expression in the given UTF-16 string view.
-
Match32ViewPtr findFirstView(std::u32string_view view) const
Find the first match of this regular expression in the given UTF-32 string view.
-
MatchPtr findFirst(const String &str) const
Find the first match of this regular expression in the given string.
Note
This method creates a copy of the captured matching string. Use
findFirstInViewfor a more memory-efficient variant that only stores references to the original string.- Parameters:
str – The string to search.
- Returns:
A match result or nullptr if there was no match.
-
Match16Ptr findFirst(const std::u16string &str) const
Find the first match of this regular expression in the given UTF-16 string.
-
Match32Ptr findFirst(const std::u32string &str) const
Find the first match of this regular expression in the given UTF-32 string.
-
MatchPtr findFirst(const InputPtr &input) const
Find the first match of this regular expression in the given input.
- Parameters:
input – Your custom input object.
- Returns:
A match result or nullptr if there was no match.
-
Match16Ptr findFirst(const Input16Ptr &input) const
Find the first match of this regular expression in the given UTF-16 input.
-
Match32Ptr findFirst(const Input32Ptr &input) const
Find the first match of this regular expression in the given UTF-32 input.
-
MatchViewGenerator findAllView(StringView view) const
Find all matches of this regular expression in the given string view.
Warning
If you like to access the content of the capture groups, the underlying string must be valid for the duration of the match and while reading the contents of the capture groups. Use
findAllif you like to keep copies of the result in the match.- Parameters:
view – The string view to search. Must be valid for the duration of the match.
- Returns:
A generator that yields all matches.
-
Match16ViewGenerator findAllView(std::u16string_view view) const
Find all matches of this regular expression in the given UTF-16 string view.
-
Match32ViewGenerator findAllView(std::u32string_view view) const
Find all matches of this regular expression in the given UTF-32 string view.
-
MatchGenerator findAll(const String &str) const
Find all matches of this regular expression in the given string.
Note
This method creates a copy of the captured matching string. Use
findAllInViewfor a more memory-efficient variant that only stores references to the original string.- Parameters:
str – The string to search.
- Returns:
A generator that yields all matches.
-
Match16Generator findAll(const std::u16string &str) const
Find all matches of this regular expression in the given UTF-16 string.
-
Match32Generator findAll(const std::u32string &str) const
Find all matches of this regular expression in the given UTF-32 string.
-
MatchGenerator findAll(InputPtr input) const
Find all matches of this regular expression in the given input.
- Parameters:
input – Your custom input object.
- Returns:
A generator that yields all matches.
-
Match16Generator findAll(Input16Ptr input) const
Find all matches of this regular expression in the given UTF-16 input.
-
Match32Generator findAll(Input32Ptr input) const
Find all matches of this regular expression in the given UTF-32 input.
-
MatchViewList collectAllView(StringView view) const
Collect all matches of this regular expression in the given string view into a vector.
Warning
If you like to access the content of the capture groups, the underlying string must be valid for the duration of the match and while reading the contents of the capture groups. Use
collectAllif you like to keep copies of the result in the match.- Parameters:
view – The string view to search. Must be valid for the duration of the match.
- Returns:
A vector containing all matches.
-
Match16ViewList collectAllView(std::u16string_view view) const
Collect all matches of this regular expression in the given UTF-16 string view into a vector.
-
Match32ViewList collectAllView(std::u32string_view view) const
Collect all matches of this regular expression in the given UTF-32 string view into a vector.
-
MatchList collectAll(const String &str) const
Collect all matches of this regular expression in the given string into a vector.
Note
This method creates a copy of the captured matching string. Use
collectAllInViewfor a more memory-efficient variant that only stores references to the original string.- Parameters:
str – The string to search.
- Returns:
A vector containing all matches.
-
Match16List collectAll(const std::u16string &str) const
Collect all matches of this regular expression in the given UTF-16 string into a vector.
-
Match32List collectAll(const std::u32string &str) const
Collect all matches of this regular expression in the given UTF-32 string into a vector.
-
MatchList collectAll(const InputPtr &input) const
Collect all matches of this regular expression in the given input into a vector.
- Parameters:
input – Your custom input object.
- Returns:
A vector containing all matches.
-
Match16List collectAll(const Input16Ptr &input) const
Collect all matches of this regular expression in the given UTF-16 input into a vector.
-
Match32List collectAll(const Input32Ptr &input) const
Collect all matches of this regular expression in the given UTF-32 input into a vector.
-
String replaceAll(StringView view, StringView replacementExpression) const
Replace all matches of this regular expression in the given string view with the replacement string.
Use
{n}to refer to captured group contents in the replacement string, wherenstands for a decimal number referring to the captured group. Use{0}to refer to the entire match. Use{name}to refer to captured group contents by name in the replacement string. To use{or}in the replacement text, escape them by doubling them{{and}}.- Parameters:
view – The string view to search and replace in.
replacementExpression – The replacement expression.
- Throws:
Error – if the replacement string contains invalid group references.
- Returns:
A new string with all matches replaced.
-
String replaceAll(StringView view, const ReplaceFn &replaceFn) const
Replace all matches of this regular expression with the result of the given replacement function.
- Parameters:
view – The string view to search and replace in.
replaceFn – The replacement function. It must take one parameter with the match (never nullptr) and return the replacement string. Throwing an exception is this function is fine, it will safely terminate the replacement process and pass the exception to the caller.
- Returns:
A new string with all matches replaced.
Public Static Functions
-
static auto compile(StringView pattern, Flags flags = {}, Settings settings = {}) -> RegExPtr
Compile a regular expression.
The default behavior is:
^and$only match at the beginning and end of the text. Change it withFlag::Multiline.The text is matches case-sensitive. Change it with
Flag::IgnoreCase.\\d,\\wand\\smatch the full Unicode ranges. Change it withFlag::Ascii..does not match line-break characters. Change it withFlag::DotAll.
- Parameters:
pattern – The regular expression pattern.
flags – The initial flags for the regular expression.
settings – The settings for compiling the regular expression and for the resulting engine.
- Throws:
Error – If the pattern is invalid.
- Returns:
A shared pointer to the compiled regular expression.
-
static String escapeText(StringView text) noexcept
Escape text to use it as literal text in a regular expression.
- Parameters:
text – The text to escape.
- Returns:
The escaped text.
-
MatchViewPtr matchView(StringView view) const
-
enum class erbsland::re::Flag : uint8_t
A single flag.
Values:
-
enumerator None
-
enumerator IgnoreCase
Ignore case when matching text.
-
enumerator DotAll
The dot operator also matches newlines.
-
enumerator Ascii
Restrict
\\w,\\d,\\sto ASCII only matching.
-
enumerator Verbose
Ignore spacing in the regular expression.
-
enumerator CRLF
Interpret CRLF line endings like a single LF.
Both characters of such line-endings are preserved when capturing text. This works similar to the folding of multi-code point Unicode characters. E.g. ``a¨`` is interpreted as ``ä``, but both code-points are preserved in text.
-
enumerator None