Diagnostics

The Erbsland Regular Expression library includes a low-level assembler and disassembler that allow you to inspect, analyze, and even manually construct the internal program executed by the matching engine.

These tools are primarily intended for diagnostics, debugging, testing, and advanced experimentation. They are not required for normal use of the library, but they provide valuable insight into how patterns are translated into executable instructions.

Disassemble Compiled Patterns

The disassembler can be used to inspect the program generated by the compiler for a given regular expression pattern.

const auto reTag = RegEx::compile(R"((?is)<([a-z]+)([^>]*)>)");

for (const auto &line : diagnostics::Disassembler{reTag}.disassemble()) {
    std::cout << line << '\n';
}

This produces a textual representation of the compiled program, including character classes, control flow, and capture group handling:

; Character classes
; ============================================================================
.section &class
$0000:                                  .class              ; [a-z]
$0000:                                  .data $000061-$00007A
; Program
; ============================================================================
.section &program
$0000:            0900003c              CI CHAR '<'
$0001:            86000000              START CAPTURE 0
$0002:            0c000000              CI CLASS $0000
$0003:            81000002 00000005     SPLIT $0002, $0005
$0005:            a6000000              STOP CAPTURE 0
$0006:            86000001              START CAPTURE 1
$0007:            81000009 0000000b     SPLIT $0009, $000B
$0009:            2900003e              NOT CI CHAR '>'
$000A:            c2000007              JUMP $0007
$000B:            a6000001              STOP CAPTURE 1
$000C:            0900003e              CI CHAR '>'
$000D:            83000000              MATCH

Reading this output is helpful when:

  • debugging unexpected matching behaviour,

  • analyzing performance characteristics,

  • learning how specific pattern constructs are compiled.

Write Custom Programs

For advanced use cases, you can also write custom matching programs directly using the assembler.

This allows you to bypass the regular expression syntax entirely and construct a program manually using the engine’s instruction set.

const auto program = std::vector<std::string_view>{
    "; my custom program",
    "loop: CHAR 'a'",
    "      SPLIT %loop, %end",
    "end:  MATCH",
};

auto reCustom = diagnostics::Assembler().compile(program);
auto match = reCustom->match("aaaaaa");

std::cout << "Result: " << match->content() << '\n';
Result: aaaaaa

Custom programs are useful for testing the engine, experimenting with new instruction sequences, or creating minimal reproducible examples when investigating bugs.

Interfaces and Types

class Assembler

An assembler for diagnostics, unit test, and experiments. Please read the documentation for the full syntax of the assembler language.

Public Functions

RegExPtr compile(const std::vector<std::string_view> &lines) const

Compile the given assembler program into a regular expression object.

Parameters:

lines – The lines of the assembler code to compile.

Throws:

Error – on any compilation error.

Returns:

The compiled regular expression object.

class Disassembler

A disassembler for the regular expression engine data.

Public Functions

explicit Disassembler(const ConstRegExPtr &regEx)

Create a new instance for the given regular expression.

std::vector<std::string> disassemble() const

Disassemble the engine data into human-readable instructions.