Quantifiers
Quantifiers control how many times the preceding element may occur.
A quantifier always applies to the immediately preceding atom (a literal, character class, escape, or group). To apply a quantifier to more than one element, wrap the sequence in a group and apply the quantifier to that group.
Quantifiers cannot appear at the start of a pattern and always require a preceding element.
Quantifier Flavors
Each quantifier is available in three distinct flavors:
Greedy – matches as much as possible, then backtracks if needed.
Lazy – matches as little as possible, expanding only if required.
Possessive – matches as much as possible and never backtracks.
Lazy and possessive variants are written by appending ? or + directly after the quantifier symbol.
Possessive quantifiers are supported in PCRE- and Java-style engines, but are not
available in Python’s standard re module. Patterns using possessive quantifiers are
therefore not portable to Python.
Use possessive quantifiers when you want to commit to a match and prevent expensive backtracking in ambiguous patterns.
Quantifier Overview
The following table summarizes all supported quantifiers and their available flavors.
Each quantifier applies to the immediately preceding atom and may be written in greedy, lazy, or possessive form.
Meaning |
Greedy |
Lazy |
Possessive |
|---|---|---|---|
Zero or one |
? |
?? |
?+ |
Zero or more |
* |
*? |
*+ |
One or more |
+ |
+? |
++ |
Exactly n |
{n} |
– |
– |
n to m |
{n,m} |
{n,m}? |
{n,m}+ |
At least n |
{n,} |
{n,}? |
{n,}+ |
Lazy and possessive variants are written by appending ? or + directly after the quantifier symbol.
Optional Quantifiers
?
Match zero or one occurrence (greedy).
Use this quantifier to make the preceding element optional.
colou?r
This matches color and colour.
?+
Match zero or one occurrence (possessive).
Once the optional element is consumed, it is never given back during backtracking.
https?+
This matches http or https and prevents backtracking inside the optional s.
??
Match zero or one occurrence (lazy).
The engine first tries to skip the optional element and only consumes it if required for the overall match to succeed.
<[^>]+??>
This allows following tokens to decide where the tag should close.
Repetition Quantifiers
*
Match zero or more occurrences (greedy).
a*
This matches "", "a", "aa", and so on.
*+
Match zero or more occurrences (possessive).
This is useful when you want to consume as much input as possible and explicitly prevent backtracking.
\S*+
This consumes a whole run of non-whitespace characters and never backtracks.
*?
Match zero or more occurrences (lazy).
<.*?>
This matches the shortest possible tag.
One-or-More Quantifiers
+
Match one or more occurrences (greedy).
\d+
This matches one or more digits.
++
Match one or more occurrences (possessive).
\w++
This consumes the longest possible word and never backtracks.
+?
Match one or more occurrences (lazy).
\d+?
This matches as few digits as possible while still allowing a full match.
Counted Quantifiers
{n}
Match exactly n occurrences (greedy).
\d{4}
This matches a four-digit year.
{n,m}
Match at least n and at most m occurrences (greedy).
The engine rejects values where m is smaller than n.
[A-F0-9]{2,4}
This matches 2 to 4 hexadecimal digits.
{n,m}+
Match at least n and at most m occurrences (possessive).
\w{2,5}+
This avoids backtracking within the counted range.
{n,m}?
Match at least n and at most m occurrences (lazy).
.{2,5}?
This expands only as far as necessary.
Open-Ended Counted Quantifiers
{n,}
Match at least n occurrences (greedy).
\w{3,}
This matches a word of length three or more.
{n,}+
Match at least n occurrences (possessive).
\d{1,}+
This consumes all following digits without backtracking.
{n,}?
Match at least n occurrences (lazy).
.{1,}?
This grows to the smallest length that still permits a match.
Possessive Quantifiers
Possessive quantifiers match as much input as possible and never give it back during backtracking.
Once a possessive quantifier has consumed input, that decision is final. If the remainder of the pattern fails, the engine does not retry with a shorter match.
Why Possessive Quantifiers Exist
This engine is based on a Thompson-style NFA implementation. In such an engine:
Greedy quantifiers create multiple possible execution paths.
Backtracking explores these paths when later parts of the pattern fail.
Certain patterns can cause excessive or even catastrophic backtracking.
Possessive quantifiers explicitly remove backtracking states from the automaton. This turns a potentially exponential search into a linear one.
In other words, possessive quantifiers act as a commit point in the pattern.
Practical Example: Avoiding Catastrophic Backtracking
Consider the following pattern:
^(\w+)*$
On long input like "aaaaaaaaaaaa!", this pattern creates a large number of overlapping
paths and may take a very long time to fail.
Using a possessive quantifier fixes the problem:
^(\w++)*$
Here, \w++ consumes the longest possible word fragment and never backtracks.
The failure happens immediately when the ! is encountered.
Practical Example: Token Consumption
Suppose you want to consume a token up to the next delimiter:
\S*:
With a greedy quantifier, the engine may backtrack if later parts fail. With a possessive quantifier:
\S*+:
the token is consumed once, and matching proceeds without reconsidering its length.
This is especially useful in parsers, log processing, and protocol grammars.
Relationship to Atomic Groups
Possessive quantifiers are closely related to atomic groups:
\w++ ≡ (?>\w+)
a*+ ≡ (?>a*)
Both constructs prevent backtracking inside their scope.
Use possessive quantifiers when:
the quantified atom is simple
you want compact syntax
performance and predictability matter
Use atomic groups when:
the quantified expression is complex
you want to commit a larger sub-pattern
In Thompson-style engines, both forms compile down to equivalent automaton structures.
Counted Quantifier Limits
Counted quantifiers accept only decimal digits with no surrounding whitespace.
Numbers may contain at most five digits.
Values are additionally limited by the engine’s maximum quantifier count (default: 10 000).
This is stricter than some engines (PCRE, RE2, Python), which allow larger values. Large ranges should instead be expressed using additional structure, multiple quantifiers, or by adjusting the engine settings.
Legacy and Compatibility Quantifiers
{,m}
Match zero to m occurrences (greedy).
This syntax is supported for compatibility but is not accepted by many other regex engines. For new patterns, prefer the explicit form {0,m}.
a{,3}
This matches "", "a", "aa", and "aaa".
Differences from Common Regex Engines
If you are migrating from regex engines such as PCRE, PCRE2, Java, RE2, or Python’s
re module, you may notice several intentional differences in quantifier behavior.
Possessive Quantifiers
This engine fully supports possessive quantifiers (for example *+, ++, {n,m}+).
Stricter Limits on Counted Quantifiers
Counted quantifiers are intentionally restricted:
numbers may contain at most five digits
values are capped by a configurable maximum (default: 10 000)
Some engines allow arbitrarily large values, which can lead to excessive memory usage or slow compilation. This engine rejects such patterns early.
Summary
Quantifiers in this engine prioritize explicit intent, linear performance, and safe failure modes.