The Regular Expression Interface

Creating New Instances

Your main entry point into the Erbsland Regular Expression library is the RegEx class. The very first step is always to compile a regular expression pattern into a RegEx instance.

auto reEmail = RegEx::compile(
    R"(([a-zA-Z0-9\._%\+\-]+)@([a-zA-Z0-9\.\-]+\.[a-zA-Z]{2,}))"
);

If the pattern contains an error, an Error exception is thrown with detailed information about the problem.

When you work with static or hard-coded expressions, it is usually best not to catch this exception and let the application fail fast. This makes it easy to locate faulty patterns during development using a debugger and avoids silently shipping broken expressions.

The instance created by the compile function stores the compiled program of your expression, including all data required for efficient matching. A RegEx instance is thread-safe, which means you can safely call all of its methods from multiple threads in parallel.

Standard Flags

The compile function accepts an optional parameter for flags. These flags mirror common inline modifiers from the regular expression syntax:

  • IgnoreCase Ignores character case during matching. This is equivalent to placing (?i) at the beginning of a pattern.

  • Multiline Allows ^ and $ to match at the beginning and end of individual lines. This corresponds to (?m).

  • DotAll Makes . and \s match line-break characters as well. This equals (?s).

  • Ascii Restricts certain character classes to the ASCII range only. This matches the behaviour of (?a).

  • Verbose Enables whitespace, line breaks and comments inside the pattern for better readability. This corresponds to (?x).

All of these flags can be specified either directly in the pattern or via the compile-time flags parameter.

Enable Line-Break Folding

The flag CRLF can only be set at compile time and enables line-break folding.

When this feature is active, the library treats any sequence of U+000D followed by U+000A (CRLF) as a single logical new-line character. As a result, a pattern that matches n will automatically match both LF and CRLF.

Although a CRLF sequence behaves like a single line break during matching, its original representation is preserved internally. If you capture the sequence, the result will either contain a single line-feed character or two characters (carriage return and line feed), depending on the original input.

Settings

You can further customize the behaviour of the regular expression parser and engine using a Settings instance.

This allows you to:

  • Enable or disable specific syntax features

  • Tighten internal limits

  • Configure timeouts for matching operations

These settings are especially useful when your application accepts patterns from untrusted or external sources, such as user-provided configuration files.

Using a Regular Expression Instance

A RegEx instance provides several groups of matching and transformation functions:

  • match Matches the pattern at the beginning of a given text.

  • fullMatch Matches the pattern against the entire text and only succeeds if the whole input matches.

  • findFirst Searches for the first location in the text where the pattern matches.

  • findAll Finds all matching locations in the text. These functions return a coroutine-based generator, allowing you to stop iteration early.

  • collectAll Similar to findAll, but eagerly collects all matches into a std::vector.

  • replaceAll Replaces all matches in a text using either a replacement pattern or a custom replacement function.

Return Values

All matching methods return one or more instances of a match type wrapped in a shared pointer:

For match, fullMatch and findFirst, a return value of nullptr indicates that no match was found.

The findAll, collectAll and replaceAll methods never use nullptr to represent matches. If no matches are found, they instead return an empty result (for example, an empty generator or an empty std::vector).

This design allows you to distinguish cleanly between no match and an exceptional condition without additional state checks.

The ...View Variants

Many functions also provide a ...View variant. For example, match has a corresponding matchView method.

These variants operate exclusively on views of the input text and return views into the original string for all match results. This approach avoids allocations and string copies, making it the most efficient option.

However, you are responsible for ensuring that the original text remains alive for as long as the match object is in use.

Pros:

  • No unnecessary allocations or string copies

  • Ideal for nested matching scenarios (e.g. matching inside a captured group)

Cons:

  • You must ensure that the original text outlives all match results

auto text = std::string("...");
auto reHeader = el::re::RegEx::compile(R"((?i)<h1[^>]*>(.*?)</h1>)");

auto match = reHeader->findFirstView(text);
if (match != nullptr) {
    auto title = std::string(match->contentView(1));
    // ...
}
// !!! BAD EXAMPLE !!!
auto match = reHeader->findFirstView(std::string("..."));
// The matched text no longer exists.
auto title = std::string(match->contentView(1)); // undefined behavior!

If you work with temporary strings or if match results need to outlive the original text, use the non-view variants. These methods create internal copies of the matched text and are safe in such scenarios.

Replacement Patterns

Replacement patterns may contain placeholders of the form {<n>} or {<name>}, which reference numeric or named capture groups.

To insert literal { or } characters, escape them by doubling: use {{ or }}.

Error Handling

Errors can occur not only while compiling a pattern, but also during the matching process itself:

  • If the matched text contains invalid UTF-8 sequences, an Encoding error is thrown.

  • If a built-in or manually configured resource or time limit is exceeded, a Timeout or Limit error is thrown.

For this reason, we recommend enclosing your matching code in a try { ... } catch (...) {} block at the earliest convenient location where you can properly handle these errors.

Parser errors during compilation usually indicate programming mistakes and should typically not be caught. Letting the application fail fast makes such issues easier to detect during development.

auto text = std::string("...");

// Do not catch parser errors here – let the application fail fast.
auto reHeader = el::re::RegEx::compile(R"((?i)<h1[^>]*>(.*?)</h1>)");

std::string title;

try {
    auto match = reHeader->findFirstView(text);
    if (match != nullptr) {
        title = std::string(match->contentView(1));
    }
} catch (const el::re::Error &error) {
    title = std::format("<error: {}>", error);
}

// Process the matching result.

UTF-16 and UTF-32 Strings

Most matching functions in the RegEx API provide overloads for UTF-16 and UTF-32 encoded strings.

These overloads behave exactly like their UTF-8 counterparts in terms of matching semantics, flags, and error handling. The only difference is the type of match object they return, which corresponds to the underlying string type.

Depending on the input, the methods return:

This ensures that positional information and content access are always expressed in units appropriate for the input string, while keeping the overall API consistent across all supported encodings.

The Match Interface →

Interfaces and Types

class RegEx : public std::enable_shared_from_this<RegEx>

A regular expression.

Public Functions

MatchViewPtr matchView(StringView view) const

Try to match this regular expression at the start of the given string view.

Warning

If you like to access the content of the capture groups, the underlying string must be valid for the duration of the match and while reading the contents of the capture groups. Use match if you like to keep copies of the result in the match.

Parameters:

view – The string view to match against. Must be valid for the duration of the match.

Returns:

A match result or nullptr if there was no match.

Match16ViewPtr matchView(std::u16string_view view) const

Try to match this regular expression at the start of the given UTF-16 string view.

Match32ViewPtr matchView(std::u32string_view view) const

Try to match this regular expression at the start of the given UTF-32 string view.

MatchPtr match(const String &str) const

Try to match this regular expression at the start of the given string.

Note

This method creates a copy of the captured matching string. Use matchView for a more memory-efficient variant that only stores references to the original string.

Parameters:

str – The string to match against.

Returns:

A match result or nullptr if there was no match.

Match16Ptr match(const std::u16string &str) const

Try to match this regular expression at the start of the given UTF-16 string.

Match32Ptr match(const std::u32string &str) const

Try to match this regular expression at the start of the given UTF-32 string.

MatchPtr match(const InputPtr &input) const

Try to match this regular expression at the start of the given input.

Parameters:

input – Your custom input object.

Returns:

A match result or nullptr if there was no match.

Match16Ptr match(const Input16Ptr &input) const

Try to match this regular expression at the start of the given UTF-16 input.

Match32Ptr match(const Input32Ptr &input) const

Try to match this regular expression at the start of the given UTF-32 input.

MatchViewPtr fullMatchView(const StringView &str) const

Try to match this regular expression for the full given string view.

Warning

If you like to access the content of the capture groups, the underlying string must be valid for the duration of the match and while reading the contents of the capture groups. Use fullMatch if you like to keep copies of the result in the match.

Parameters:

str – The string view to match against. Must be valid for the duration of the match.

Returns:

A match result or nullptr if there was no match.

Match16ViewPtr fullMatchView(const std::u16string_view &str) const

Try to match this regular expression for the full given UTF-16 string view.

Match32ViewPtr fullMatchView(const std::u32string_view &str) const

Try to match this regular expression for the full given UTF-32 string view.

MatchPtr fullMatch(const String &str) const

Try to match this regular expression for the full given string.

Note

This method creates a copy of the captured matching string. Use matchView for a more memory-efficient variant that only stores references to the original string.

Parameters:

str – The string to match against.

Returns:

A match result or nullptr if there was no match.

Match16Ptr fullMatch(const std::u16string &str) const

Try to match this regular expression for the full given UTF-16 string.

Match32Ptr fullMatch(const std::u32string &str) const

Try to match this regular expression for the full given UTF-32 string.

MatchPtr fullMatch(const InputPtr &input) const

Try to match this regular expression for the full input.

The regular expression must match the full input from start to end to be successful.

Parameters:

input – Your custom input object.

Returns:

A match result or nullptr if there was no match.

Match16Ptr fullMatch(const Input16Ptr &input) const

Try to match this regular expression for the full given UTF-16 input.

Match32Ptr fullMatch(const Input32Ptr &input) const

Try to match this regular expression for the full given UTF-32 input.

MatchViewPtr findFirstView(StringView view) const

Find the first match of this regular expression in the given string view.

Warning

If you like to access the content of the capture groups, the underlying string must be valid for the duration of the match and while reading the contents of the capture groups. Use findFirst if you like to keep copies of the result in the match.

Parameters:

view – The string view to search. Must be valid for the duration of the match.

Returns:

A match result or nullptr if there was no match.

Match16ViewPtr findFirstView(std::u16string_view view) const

Find the first match of this regular expression in the given UTF-16 string view.

Match32ViewPtr findFirstView(std::u32string_view view) const

Find the first match of this regular expression in the given UTF-32 string view.

MatchPtr findFirst(const String &str) const

Find the first match of this regular expression in the given string.

Note

This method creates a copy of the captured matching string. Use findFirstInView for a more memory-efficient variant that only stores references to the original string.

Parameters:

str – The string to search.

Returns:

A match result or nullptr if there was no match.

Match16Ptr findFirst(const std::u16string &str) const

Find the first match of this regular expression in the given UTF-16 string.

Match32Ptr findFirst(const std::u32string &str) const

Find the first match of this regular expression in the given UTF-32 string.

MatchPtr findFirst(const InputPtr &input) const

Find the first match of this regular expression in the given input.

Parameters:

input – Your custom input object.

Returns:

A match result or nullptr if there was no match.

Match16Ptr findFirst(const Input16Ptr &input) const

Find the first match of this regular expression in the given UTF-16 input.

Match32Ptr findFirst(const Input32Ptr &input) const

Find the first match of this regular expression in the given UTF-32 input.

MatchViewGenerator findAllView(StringView view) const

Find all matches of this regular expression in the given string view.

Warning

If you like to access the content of the capture groups, the underlying string must be valid for the duration of the match and while reading the contents of the capture groups. Use findAll if you like to keep copies of the result in the match.

Parameters:

view – The string view to search. Must be valid for the duration of the match.

Returns:

A generator that yields all matches.

Match16ViewGenerator findAllView(std::u16string_view view) const

Find all matches of this regular expression in the given UTF-16 string view.

Match32ViewGenerator findAllView(std::u32string_view view) const

Find all matches of this regular expression in the given UTF-32 string view.

MatchGenerator findAll(const String &str) const

Find all matches of this regular expression in the given string.

Note

This method creates a copy of the captured matching string. Use findAllInView for a more memory-efficient variant that only stores references to the original string.

Parameters:

str – The string to search.

Returns:

A generator that yields all matches.

Match16Generator findAll(const std::u16string &str) const

Find all matches of this regular expression in the given UTF-16 string.

Match32Generator findAll(const std::u32string &str) const

Find all matches of this regular expression in the given UTF-32 string.

MatchGenerator findAll(InputPtr input) const

Find all matches of this regular expression in the given input.

Parameters:

input – Your custom input object.

Returns:

A generator that yields all matches.

Match16Generator findAll(Input16Ptr input) const

Find all matches of this regular expression in the given UTF-16 input.

Match32Generator findAll(Input32Ptr input) const

Find all matches of this regular expression in the given UTF-32 input.

MatchViewList collectAllView(StringView view) const

Collect all matches of this regular expression in the given string view into a vector.

Warning

If you like to access the content of the capture groups, the underlying string must be valid for the duration of the match and while reading the contents of the capture groups. Use collectAll if you like to keep copies of the result in the match.

Parameters:

view – The string view to search. Must be valid for the duration of the match.

Returns:

A vector containing all matches.

Match16ViewList collectAllView(std::u16string_view view) const

Collect all matches of this regular expression in the given UTF-16 string view into a vector.

Match32ViewList collectAllView(std::u32string_view view) const

Collect all matches of this regular expression in the given UTF-32 string view into a vector.

MatchList collectAll(const String &str) const

Collect all matches of this regular expression in the given string into a vector.

Note

This method creates a copy of the captured matching string. Use collectAllInView for a more memory-efficient variant that only stores references to the original string.

Parameters:

str – The string to search.

Returns:

A vector containing all matches.

Match16List collectAll(const std::u16string &str) const

Collect all matches of this regular expression in the given UTF-16 string into a vector.

Match32List collectAll(const std::u32string &str) const

Collect all matches of this regular expression in the given UTF-32 string into a vector.

MatchList collectAll(const InputPtr &input) const

Collect all matches of this regular expression in the given input into a vector.

Parameters:

input – Your custom input object.

Returns:

A vector containing all matches.

Match16List collectAll(const Input16Ptr &input) const

Collect all matches of this regular expression in the given UTF-16 input into a vector.

Match32List collectAll(const Input32Ptr &input) const

Collect all matches of this regular expression in the given UTF-32 input into a vector.

String replaceAll(StringView view, StringView replacementExpression) const

Replace all matches of this regular expression in the given string view with the replacement string.

Use {n} to refer to captured group contents in the replacement string, where n stands for a decimal number referring to the captured group. Use {0} to refer to the entire match. Use {name} to refer to captured group contents by name in the replacement string. To use { or } in the replacement text, escape them by doubling them {{ and }}.

Parameters:
  • view – The string view to search and replace in.

  • replacementExpression – The replacement expression.

Throws:

Error – if the replacement string contains invalid group references.

Returns:

A new string with all matches replaced.

String replaceAll(StringView view, const ReplaceFn &replaceFn) const

Replace all matches of this regular expression with the result of the given replacement function.

Parameters:
  • view – The string view to search and replace in.

  • replaceFn – The replacement function. It must take one parameter with the match (never nullptr) and return the replacement string. Throwing an exception is this function is fine, it will safely terminate the replacement process and pass the exception to the caller.

Returns:

A new string with all matches replaced.

Public Static Functions

static auto compile(StringView pattern, Flags flags = {}, Settings settings = {}) -> RegExPtr

Compile a regular expression.

The default behavior is:

  • ^ and $ only match at the beginning and end of the text. Change it with Flag::Multiline.

  • The text is matches case-sensitive. Change it with Flag::IgnoreCase.

  • \\d, \\w and \\s match the full Unicode ranges. Change it with Flag::Ascii.

  • . does not match line-break characters. Change it with Flag::DotAll.

Parameters:
  • pattern – The regular expression pattern.

  • flags – The initial flags for the regular expression.

  • settings – The settings for compiling the regular expression and for the resulting engine.

Throws:

Error – If the pattern is invalid.

Returns:

A shared pointer to the compiled regular expression.

static String escapeText(StringView text) noexcept

Escape text to use it as literal text in a regular expression.

Parameters:

text – The text to escape.

Returns:

The escaped text.

enum class erbsland::re::Flag : uint8_t

A single flag.

Values:

enumerator None
enumerator IgnoreCase

Ignore case when matching text.

enumerator Multiline

Match at the beginning and end of each line.

enumerator DotAll

The dot operator also matches newlines.

enumerator Ascii

Restrict \\w, \\d, \\s to ASCII only matching.

enumerator Verbose

Ignore spacing in the regular expression.

enumerator CRLF

Interpret CRLF line endings like a single LF.

Both characters of such line-endings are preserved when capturing text.
This works similar to the folding of multi-code point Unicode characters.
E.g. ``a¨`` is interpreted as ``ä``, but both code-points are preserved in text.

class Flags : public impl::EnumFlags<Flags, Flag>

Flags for constructing a regular expression object.

Public Functions

inline bool isNone() const noexcept

Test if no flags are set.

inline std::string toString() const

Create a diagnostic string for the flags.