How to Use std::u8string as Default

The Erbsland Regular Expression Library supports using either std::string or std::u8string as its default string type.

This page explains how to enable the std::u8string configuration when you integrate the library as a submodule. For the generic submodule integration steps (project layout, linking, etc.), see Integrate the Engine as Submodule.

Enable std::u8string in CMake

The library selects the string type at build time using the CMake option ERBSLAND_RE_U8STRING.

In your top-level CMakeLists.txt, enable the option before you call add_subdirectory(erbsland-re):

<project>/CMakeLists.txt
cmake_minimum_required(VERSION 3.25)
project(ExampleProject)

set(ERBSLAND_RE_U8STRING ON CACHE BOOL "Use std::u8string for all strings" FORCE)
add_subdirectory(erbsland-re)

add_subdirectory(example)

This option causes the library target to export the public compile definition ERBSLAND_RE_USE_U8STRING=1. All targets that link against erbsland-re will then compile with the same string configuration.

What Changes in the API

The public API exposes the configured string types via the aliases in the namespace erbsland::re:

  • String becomes std::u8string

  • StringView becomes std::u8string_view

This is the recommended way to write code that works with both configurations.

String and Character Literals

In std::u8string mode, string literals are UTF-8 byte sequences of type const char8_t[].

Prefer using the provided macros when you need a literal that should compile in both modes:

using namespace el::re;

auto pattern = ERBSLAND_RE_STRING_LITERAL("\\d+");
auto text = String{ERBSLAND_RE_STRING_LITERAL("abc 123 xyz")};

If you only target the std::u8string configuration, use the C++20 u8"..." literals directly:

auto text = std::u8string{u8"abc 123 xyz"};