Unsupported Expressions

This page documents syntax constructs that are intentionally not supported by this library.

These omissions are deliberate. They avoid ambiguity, improve Unicode correctness, and guarantee predictable performance characteristics. Where possible, alternatives or recommended approaches are provided.

Numeric and Ambiguous Escapes

\0dd

Zero-prefixed numeric escapes are interpreted inconsistently across regex engines:

  • Some treat them as octal escapes.

  • Others interpret them as numeric backreferences.

To avoid this ambiguity entirely, this engine rejects \0dd syntax.

Use instead:

  • Explicit octal escapes: \o{ddd…}

  • Unicode escapes: \u{hh…}

\ddd

A plain numeric escape is ambiguous in many regex dialects and often overlaps with replacement syntax.

This engine therefore rejects \ddd escapes outright and requires explicit notation to make intent clear.

Use instead:

  • \o{ddd…}

  • \u{hh…}

Byte-Oriented and Line-Oriented Escapes

\C

In some engines, \C matches a single raw byte.

This engine operates on valid UTF-8 code points only. Matching raw bytes could split multi-byte sequences and produce invalid Unicode, potentially corrupting downstream processing.

For safety and correctness, \C is not supported.

\R

The \R escape usually means “any line break sequence”.

This library normalizes line endings internally. As a result, \R would collapse to \n and behave differently depending on normalization settings.

To avoid surprising behavior, this escape is rejected.

Use instead:

  • \n for normalized newlines

  • Use the Flag::CRLF flag, to fold CRLF line-breaks into LF ones.

  • An explicit alternation such as (\r\n|\n|\r) when working with raw input

Unicode and Text Segmentation

\X

In some engines, \X matches an extended grapheme cluster.

Correct grapheme segmentation requires full Unicode boundary rules and stateful matching across code points. This is outside the scope of this engine.

Use instead:

  • Explicit Unicode property classes (for example [\p{L}\p{N}])

  • Quantifiers to control cluster size

Backtracking-Dependent Constructs

Backreferences

Backreferences (numeric or named) turn regular expressions into non-regular patterns. They require backtracking and can lead to exponential runtime behavior.

This engine prioritizes predictable performance and therefore does not support backreferences.

Use instead:

  • Capture substrings

  • Compare them explicitly in application code

(?|…)

Branch-reset groups change capture numbering depending on which alternative matched.

This makes group indices unstable and complicates result handling.

Use instead:

  • Non-capturing groups

  • Explicit named groups for clarity

Lookahead and Lookbehind

Lookaround assertions require evaluating alternative match paths without consuming input. This often forces backtracking and complicates streaming evaluation.

To keep matching efficient and predictable, lookahead and lookbehind constructs are not supported.

Use instead:

  • Pattern restructuring

  • Additional validation in application code

Recursion and Subroutines

Recursive patterns and subroutine calls can express context-free grammars but introduce unbounded recursion and complex backtracking behavior.

These constructs are rejected to maintain safety and performance guarantees.

Control-Flow and State-Based Syntax

\G

The \G anchor matches at the end of the previous match in some engines.

This engine treats matches as independent operations and does not retain implicit state between searches.

Use instead:

  • Resume matching at a specific offset using the API

\K

The \K escape resets the start of the reported match mid-pattern.

This primarily exists to support replacement shortcuts in engines without rich capture APIs. In this library, captures are first-class and provide the same functionality without altering match semantics.

(?C) / (?Cn)

Callout expressions embed user callbacks into the matching process.

This engine exposes structured debugging and inspection facilities through its API instead, so embedding callbacks into the pattern is unnecessary and unsupported.

(?J)

Allowing duplicate group names makes it unclear which capture is returned and complicates maintenance.

This engine enforces unique group names to keep captures deterministic and readable.

(?U)

This mode flips quantifiers to be ungreedy by default.

Global behavior changes of this kind obscure intent and are easy to confuse with the Unicode-related (?u) flag.

Use instead:

  • Explicit lazy quantifiers where required

Parser and Engine Directives

(*…)

The (*…) verbs act as parser or engine tuning directives in some regex flavors.

This engine exposes configuration through its API rather than inline syntax, keeping patterns focused on matching logic and portable across environments.

Conditional Patterns

Conditional groups depend on runtime match state (for example whether a group was matched) and introduce hidden control flow into the pattern.

This execution model is not supported.

Use instead:

  • Separate expressions

  • Conditional logic in application code

Empty Groups and Alternatives

Empty groups such as () or (?:) are not allowed by default.

They match an empty string without performing a meaningful operation and are often the result of a mistake.

If you explicitly need empty groups, enable them using Feature::EmptyGroups.

Empty alternatives are also rejected by default. If required, enable Feature::EmptyAlternatives and ensure that the group still contains at least one non-empty alternative or surrounding structure.

Summary

Unsupported constructs in this engine are not accidental omissions. Each rejected feature would either:

  • introduce ambiguity

  • weaken Unicode guarantees

  • complicate performance characteristics

  • or obscure pattern intent

When a construct is rejected, the parser error typically points toward a clearer and more explicit alternative.