SourceForge.net Logo

PCRE Performance Project

About

The aim of PCRE-sljit project is speeding up the pattern matching speed of Perl Compatible Regular Expressions library (ftp download). The task is achieved by using sljit, a just-in-time (JIT) compilation library to translate machine code from the internal byte-code representation generated by pcre_compile(). PCRE-sljit offers similar matching speed to DFA based engines (like re2) on common patterns but still keep PERL compatibility (see here).

This work has been released as part of PCRE 8.20 and above. Now (PCRE 8.31) nearly all PCRE features are supported including UTF-8/16 and partial matching.

Usage

Because of compatibility reasons the JIT compilation must be explicitly requested by software. First of all PCRE must be compiled with --enable-jit. (Note: JIT compiler availability can be checked runtime by passing PCRE_CONFIG_JIT to pcre_config().) The next step is performing the JIT compilation by passing PCRE_STUDY_JIT_COMPILE, PCRE_STUDY_JIT_PARTIAL_SOFT_COMPILE or PCRE_STUDY_JIT_PARTIAL_HARD_COMPILE to pcre_study(). Regardless of the success of JIT compilation, the returned extra data can be passed to pcre_exec(), which use the machine code when approprite, or fallback to the interpreted code path (Note: a few patterns are not supported by the JIT compiler). For debugging purposes, it is possible to check whether the JIT compilation is successful by passing PCRE_INFO_JIT to pcre_fullinfo(). If the JIT compilation is succesful it is mandatory to use pcre_free_study() to free the machine code. Otherwise pcre_free_study() is optional for compatibility reasons, but I suggest to use it all the time.

Detailed info about the JIT compiler can be found in pcrejit.3, which is part of the PCRE documentation. It is advisable to read the "CONTROLLING THE JIT STACK" section there.

Motivation

The importance of pattern matching has been significantly grown in the last decade. The increased computation power allows running more complex regular expressions in an acceptable time frame. However, this is not the only way to speed up regular expressions. The developers of JavaScript engines realized that their just-in-time compilation engines are able to considerably improve the runtime of pattern matching. Thus, JIT based regular expression engines were developed like irregexp from Google and YARR (Yet Another Regex Runtime) from Apple. However, these engines are heavily tied to their JavaScript engine (V8 from Google and JavaScriptCore from Apple), and other software cannot benefit from this shiny new technology, until now. The aim of this work is to extend a widely used regular expression library (PCRE) with a JIT compiler engine, which can be easly adopted by any software already using PCRE.

How does it work?

First, a regular expression is compiled into an internal representation by pcre_compile(). The internal representation (usually called MIR - Middle Level Representation) is a sequence of byte-codes. Each byte-code is basically a command, which tells the next step for the interpreter inside pcre_exec(). The JIT compiler translates MIR to machine code when the appropriate flags are passed to pcre_study(). The returned pcre_extra data contains a pointer to a machine executable function if the machine code generation was successful.

Why is it faster?

JIT compilers totally eliminate the continual reparsing of MIR. Even if the MIR code is much simpler than the original pattern string, the execution engine is full of ifs and switches, and executing them consumes considerable time. The compiled machine code only contains those machine instructions whose are absolutely necessary for this particular pattern, and nothing more.

This advantage is also a limitation since JIT compiler does not support the changing of compile time flags. One such flag is the newline type (PCRE supports many of them). PCRE allows overwriting the compile time newline flags when pcre_exec() is called. This is not supported by the JIT compiler, and the regular expression will be fallbacks to the interpreter.

Compile time overhead

Deciding when to use or not use JIT compiling is an important question. Since JIT is a heavyweight optimization we should never forget about the compile time overhead. Thus, the total runtime can be bigger if we use the compiled expression only once on a small input.

The following values measured on an Intel 2.67GHz with GCC 4.4.5 in 64 bit mode
Note: ns means nanosecond (10-9), and Int. means interpreter (pcre_exec())

PatternCompile time
(ns)
JIT compile time
(ns)
JIT compile /
compile time
abc264793230.05
a+(b*)[c|d]??e+?9251718618.58
\A.*\K[^a-z!]{8,}\Z9251150112.43
\b((.)\w{3,}\2{0,3})$10572174720.57
(?P<cap>\p{N}+|(?(R)a|(?R)))\113223086923.35
^(?![a-zA-Z]*?[[:alpha:]]++)\P{L&}15202042513.44

Results

The following patterns are searched on a non-utf8 input and an utf8 input. The utf8 input was shorter so it was repeated 4 times (with a simple memcpy()).

The performace is measured on an Intel 2.67GHz with GCC 4.4.5 in both 32 and 64 bit mode. The unit of the runtime values are milliseconds (ms) which is 10-3 second. The PCRE library was compiled with -O3 and -static compiler options to get the best performance form the compiler.

Description of the colums
  • Pattern: the pattern string. The light blue lines are caseless, the light yellow lines are caseful matches.
  • Int. (ms): interpreter (normal pcre_exec(...)) runtime in milliseconds.
  • JIT (ms): JIT compiled machine code runtime in milliseconds.
  • Runtime ratio: Interpreter runtime / JIT runtime.
  • Runtime save %: Runtime saved by the JIT compiler. Equals to 1 - (JIT runtime / Interpreter runtime)

PatternInt.
(ms)
JIT
(ms)
Runtime
ratio
Runtime
save %
how to60401.5033.3%
walk|get|she|make260803.2569.2%
[Ww]alk|[Gg]et|[Ss]he|[Mm]ake2601002.6061.5%
(?:ba|lu|ro)+(?:r|ck)*200603.3370.0%
(?:ba|lu|ro)+(r|ck)*220802.7563.6%
\w+our\w*17603804.6378.4%
(?:^\w+$)|\b(\W)\1+\b18403205.7582.6%
"[A-Z][a-z]*(?:(?:[,\s]|\R)(\s|\R)*[a-z]+){0,4}?[.!]"80402.0050.0%
\b(?:(?=\w+ro\w+)(?=\w+th\w+)\w+hr\w+)\b12803204.0075.0%
\b(?(?=\w+sh)\w+a|\w+uo)\w+\b20204005.0580.2%
\b\WX{0,3}(?:(?:V?I{1,3})|V|IV|IX)\W\b7801804.3376.9%
((\w)+?)\.\b(\s)*?\w48005608.5788.3%
(.{1,3})(\w*?\1)(?2)678016804.0475.2%
-?-?-?-?-?-?-?-?-?-?-?-----------$6401006.4084.4%
Average:14983104.1670.6%
Non-utf8 input on a 32 bit x86 machine.

PatternInt.
(ms)
JIT
(ms)
Runtime
ratio
Runtime
save %
how to60302.0050.0%
walk|get|she|make240803.0066.7%
[Ww]alk|[Gg]et|[Ss]he|[Mm]ake250902.7864.0%
(?:ba|lu|ro)+(?:r|ck)*190802.3857.9%
(?:ba|lu|ro)+(r|ck)*200802.5060.0%
\w+our\w*15904203.7973.6%
(?:^\w+$)|\b(\W)\1+\b15803204.9479.7%
"[A-Z][a-z]*(?:(?:[,\s]|\R)(\s|\R)*[a-z]+){0,4}?[.!]"90303.0066.7%
\b(?:(?=\w+ro\w+)(?=\w+th\w+)\w+hr\w+)\b10203203.1968.6%
\b(?(?=\w+sh)\w+a|\w+uo)\w+\b17104104.1776.0%
\b\WX{0,3}(?:(?:V?I{1,3})|V|IV|IX)\W\b7201704.2476.4%
((\w)+?)\.\b(\s)*?\w43305907.3486.4%
(.{1,3})(\w*?\1)(?2)597015903.7573.4%
-?-?-?-?-?-?-?-?-?-?-?-----------$5801005.8082.8%
Average:13233073.7870.1%
Non-utf8 input on a 64 bit x86 machine.

PatternInt.
(ms)
JIT
(ms)
Runtime
ratio
Runtime
save %
die der20201.00 0.0%
ist|der|die|und100205.0080.0%
\b\w+\b2401601.5033.3%
(?:da|ge|om)+(?:n|me)*80204.0075.0%
\b(?(?=\w+ro)\w+pa|\w+lle)\w+\b6203002.0751.6%
\b(\W)\1+\b|(^(?=.*kl)(?=.*no).{15,40}$)7402003.7073.0%
^.{4,32}(\P{N})\1{2,}.{4,32}(?260604.3376.9%
^(\w{3,})(?!\1).*\h.*\1$502025002.0150.2%
((\w{2,8},?(\P{Z}|\R)){1,2}\.\s?)$578015003.8574.0%
\b\w*?((.){1,3}\w*\2)\w*?(?1)20007802.5661.0%
\w*?(b{2,3})\w*?c13404203.1968.7%
\P{Lu}\P{L&}{0,12}[\s\-]{1,4}..[\P{L}\P{N}]{4}3201202.6762.5%
\b(\B([c-h])\B|[a-z]+?(?1)[a-z])13003204.0675.4%
Average:13704933.0760.1%
Utf8 input on a 32 bit x86 machine.

PatternInt.
(ms)
JIT
(ms)
Runtime
ratio
Runtime
save %
die der10101.00 0.0%
ist|der|die|und100402.5060.0%
\b\w+\b2501202.0852.0%
(?:da|ge|om)+(?:n|me)*80302.6762.5%
\b(?(?=\w+ro)\w+pa|\w+lle)\w+\b5702502.2856.1%
\b(\W)\1+\b|(^(?=.*kl)(?=.*no).{15,40}$)6901803.8373.9%
^.{4,32}(\P{N})\1{2,}.{4,32}(?220603.6772.7%
^(\w{3,})(?!\1).*\h.*\1$441019002.3256.9%
((\w{2,8},?(\P{Z}|\R)){1,2}\.\s?)$494012304.0275.1%
\b\w*?((.){1,3}\w*\2)\w*?(?1)17206302.7363.4%
\w*?(b{2,3})\w*?c12003403.5371.7%
\P{Lu}\P{L&}{0,12}[\s\-]{1,4}..[\P{L}\P{N}]{4}3001003.0066.7%
\b(\B([c-h])\B|[a-z]+?(?1)[a-z])9303003.1067.7%
Average:11863992.8359.9%
Utf8 input on a 64 bit x86 machine.

Conclusion

JIT compiling is a powerful technology, which is able to speed up interpreted execution including Java, JavaScript, ActionScript (with NanoJIT) or regular expression engines.

Contribution

I would like to encourage all of you to join the PCRE Performance Project, and make PCRE faster than ever. My email address is hzmester (at) freemail (dot) hu.

Special thanks

First and foremost, the author thanks Philip Hazel for his continual help and support.

Sorted as familiy name in alphabetic order:
Giuseppe D'Angelo-QRegularExpression in Qt 5 and above.
Victor Julien-Suricata project (http://www.openinfosecfoundation.org/).
Sheri Pierce-Testing and feedback.
Petr Pisar

Last modification: 7.4.2012