Diagnostics
The Erbsland Regular Expression library includes a low-level assembler and disassembler that allow you to inspect, analyze, and even manually construct the internal program executed by the matching engine.
These tools are primarily intended for diagnostics, debugging, testing, and advanced experimentation. They are not required for normal use of the library, but they provide valuable insight into how patterns are translated into executable instructions.
Disassemble Compiled Patterns
The disassembler can be used to inspect the program generated by the compiler for a given regular expression pattern.
const auto reTag = RegEx::compile(R"((?is)<([a-z]+)([^>]*)>)");
for (const auto &line : diagnostics::Disassembler{reTag}.disassemble()) {
std::cout << line << '\n';
}
This produces a textual representation of the compiled program, including character classes, control flow, and capture group handling:
; Character classes
; ============================================================================
.section &class
$0000: .class ; [a-z]
$0000: .data $000061-$00007A
; Program
; ============================================================================
.section &program
$0000: 0900003c CI CHAR '<'
$0001: 86000000 START CAPTURE 0
$0002: 0c000000 CI CLASS $0000
$0003: 81000002 00000005 SPLIT $0002, $0005
$0005: a6000000 STOP CAPTURE 0
$0006: 86000001 START CAPTURE 1
$0007: 81000009 0000000b SPLIT $0009, $000B
$0009: 2900003e NOT CI CHAR '>'
$000A: c2000007 JUMP $0007
$000B: a6000001 STOP CAPTURE 1
$000C: 0900003e CI CHAR '>'
$000D: 83000000 MATCH
Reading this output is helpful when:
debugging unexpected matching behaviour,
analyzing performance characteristics,
learning how specific pattern constructs are compiled.
Write Custom Programs
For advanced use cases, you can also write custom matching programs directly using the assembler.
This allows you to bypass the regular expression syntax entirely and construct a program manually using the engine’s instruction set.
const auto program = std::vector<std::string_view>{
"; my custom program",
"loop: CHAR 'a'",
" SPLIT %loop, %end",
"end: MATCH",
};
auto reCustom = diagnostics::Assembler().compile(program);
auto match = reCustom->match("aaaaaa");
std::cout << "Result: " << match->content() << '\n';
Result: aaaaaa
Custom programs are useful for testing the engine, experimenting with new instruction sequences, or creating minimal reproducible examples when investigating bugs.
Interfaces and Types
-
class Assembler
An assembler for diagnostics, unit test, and experiments. Please read the documentation for the full syntax of the assembler language.
Public Functions
-
class Disassembler
A disassembler for the regular expression engine data.