.. index:: single: Input ******************* The Input Interface ******************* The :cpp:class:`Input ` interface allows you to provide custom input sources to the regular expression engine. Because the Erbsland Regular Expression library is based on a Thompson NFA, input is consumed sequentially and processed in a highly efficient streaming fashion. This makes it possible to match patterns not only against in-memory strings, but also against data sources such as files, network streams, or custom iterators. The input interface is a low-level extension point intended for advanced use cases. If you only need to match against strings, the built-in string overloads are usually the better and simpler choice. How to Implement Your Source ============================ To implement a custom input source, derive from one of the following classes: * :cpp:class:`Input ` for UTF-8 input * :cpp:class:`Input16 ` for UTF-16 input * :cpp:class:`Input32 ` for UTF-32 input The chosen base class determines the character type and the type of match objects returned by the matching API. Your implementation must override the abstract methods defined by :cpp:class:`InputBase `. These methods form the contract between your input source and the matching engine. Implementation Requirements ---------------------------- The most important method is :cpp:any:`read `. It is invoked in the hot loop of the matching engine and must therefore be implemented as efficiently as possible. When implementing an input source, the following rules must be respected: * :cpp:any:`read ` must return the next character together with its position. * When the end of the input is reached, a zero character must be returned. * Repeated calls to :cpp:any:`read ` after the end of the input must continue to succeed and keep returning a zero character. * The returned position must advance monotonically and must uniquely identify the character within the input stream. If line-break folding (CRLF handling) is enabled for the regular expression, your input source must additionally implement: * :cpp:any:`peek ` to look ahead without consuming input * :cpp:any:`skip ` to advance the input by one character These methods allow the engine to treat ``CRLF`` sequences as a single logical line break while preserving correct positional information. Incorrect or incomplete implementations may lead to incorrect matches or undefined behavior. Example Implementation ====================== The following example shows a complete implementation of a custom input source that reads characters from a ``std::vector``. While simplified, it demonstrates all required methods and lifetime rules. .. literalinclude:: files/input.hpp :language: cpp :linenos: .. button-ref:: error :ref-type: doc :color: success :align: center :expand: :class: sd-fs-5 sd-font-weight-bold sd-p-2 sd-my-4 Errors → Interfaces and Types ==================== .. doxygenclass:: erbsland::re::InputBase :members: .. doxygenclass:: erbsland::re::Input :members: .. doxygenclass:: erbsland::re::Input16 :members: .. doxygenclass:: erbsland::re::Input32 :members: .. doxygenclass:: erbsland::re::InputForView :members: .. doxygenclass:: erbsland::re::Input16ForView :members: .. doxygenclass:: erbsland::re::Input32ForView :members: .. doxygenstruct:: erbsland::re::CharAndPosition .. doxygentypedef:: erbsland::re::InputPosition