Delphi Inspiration

Components and Applications

User Tools

Site Tools


YuPcre2: Version History

YuPcre2 is a new regular expression library for Delphi with Perl syntax. Directly supports UnicodeString, AnsiString, or UCS4String, as well as UTF-8, and UTF-16.

YuPcre2 1.11.0 – 8 Oct 2019

  • Fix subject buffer overread in JIT when UTF is disabled and \X or \R has a greater than 1 fixed quantifier.
  • Added support for callouts from pcre2_substitute.
  • Fix an xclass matching issue in JIT.
  • Implement the Perl 5.28 experimental alphabetic names for atomic groups and lookaround assertions, for example, (*pla:…) and (*atomic:…). These are characterized by a lower case letter following (*.
  • Implement the new Perl “script run” features (*script_run:…) and (*atomic_script_run:…) aka (*sr:…) and (*asr:…).
  • Implement PCRE2_COPY_MATCHED_SUBJECT for pcre2_match (including JIT via pcre2_match) and pcre2_dfa_match, but *not* the pcre2_jit_match fast path. Also, when a match fails, set the subject field in the match data to nil for tidiness - none of the substring extractors should reference this after match failure.
  • If a pattern started with a subroutine call that had a quantifier with a minimum of zero, an incorrect “match must start with this character” could be recorded. Example: (?&xxx)*ABC(?<xxx>XYZ) would (incorrectly) expect 'A' to be the first character of a match.
  • The heap limit checking code in pcre2_dfa_match could suffer from overflow if the heap limit was set very large. This could cause incorrect “heap limit exceeded” errors.
  • If a pattern started with (*MARK), (*COMMIT), (*PRUNE), (*SKIP)#, or (*THEN) followed by ^ it was not recognized as anchored.
  • With PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL set, escape sequences such as \s which are valid in character classes, but not as the end of ranges, were being treated as literals. An example is [_-\s] (but not [\s-_] because that gave an error at the start of a range). Now an “invalid range” error is given independently of PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL.
  • PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL was affecting known escape sequences such as \eX when they appeared invalidly in a character class. Now the option applies only to unrecognized or malformed escape sequences.
  • The pcre2_dfa_match function was incorrectly handling conditional version tests such as (?(VERSION>=0)…) when the version test was true. Incorrect processing or a crash could result.
  • When PCRE2_UTF is set, allow non-ASCII letters and decimal digits in group names, as Perl does.
  • Implemented PCRE2_EXTRA_ALT_BSUX to support ECMAScript 6's \u{hhh} construct.
  • Compile \p{Any} to be the same as . in PCRE2_DOTALL mode, so that it benefits from auto-anchoring if \p{Any}* starts a pattern.
  • Disable SSE2 JIT optimizations in x86 CPUs when SSE2 is not available.
  • Improve DIUtils.pas Unicode processing to support Unicode Code Points from $000000 to $10FFFF. Adjust remaining source code accordingly.
  • Update DIUtils Unicode functions to Unicode 12.1.0.
  • Remove include file. Directly link in instead.

YuPcre2 1.10.0 – 7 Mar 2019

  • Fix: TDIRegEx2_8.Replace and TDIRegEx2_16.Replace did not return the start of the string if StartOffset > 0.
  • Adjust TDIRegEx2SearchStream_Enc to DIConverters 1.18.0: Converter functions now use the native unsigned integer type for the length of a string and support stings longer than 2 GB. This change only affects projects using DIConverters 1.18.0.

YuPcre2 1.9.2 – 8 Jan 2019

  • Matching the pattern (*UTF)\C[^\v]+\x80 against an 8-bit string containing multi-code-unit characters caused bad behaviour and possibly a crash.
  • When returning an error from pcre2_pattern_convert, ensure the error offset is set zero for early errors.
  • Refactored pcre2_dfa_match so that the internal recursive calls no longer use the stack for local workspace and local ovectors. Instead, an initial block of stack is reserved, but if this is insufficient, heap memory is used. The heap limit parameter now applies to pcre2_dfa_match.
  • In pcre2_substitute, with global matching, a pattern that matched an empty string, but never at the starting match offset, was not handled in a Perl-compatible way. The pattern (<?=\G.) is an example of such a pattern. Because \G is in a lookbehind assertion, there has to be a “bumpalong” before there can be a match. The automatic “advance by one character after an empty string match” rule is therefore inappropriate. A more complicated algorithm has now been implemented.
  • When checking to see if a lookbehind is of fixed length, lookaheads were correctly ignored, but qualifiers on lookaheads were not being ignored, leading to an incorrect “lookbehind assertion is not fixed length” error.
  • Updated to Unicode version 11.0.0. As well as the usual addition of new scripts and characters, this involved re-jigging the grapheme break property algorithm because Unicode has changed the way emojis are handled.
  • Fixed an obscure bug that struck when there were two atomic groups not separated by something with a backtracking point. There could be an incorrect backtrack into the first of the atomic groups. A complicated example is (?>a(*:1))(?>b)(*SKIP:1)x|.* matched against “abc”, where the *SKIP shouldn't find a MARK (because is in an atomic group), but it did.
  • (*ACCEPT:ARG), (*FAIL:ARG), and (*COMMIT:ARG) are now supported.
  • A (*MARK) name was not being passed back for positive assertions that were terminated by (*ACCEPT).
  • Add support for \N{U+dddd}, but only in Unicode mode.
  • Add support for (?^) for unsetting all imnsx options.
  • The PCRE2_EXTENDED (/x) option only ever discarded space characters whose code point was less than 256. Now, when Unicode support is compiled, PCRE2_EXTENDED also discards U+0085, U+200E, U+200F, U+2028, and U+2029, which are additional characters defined by Unicode as “Pattern White Space”. This makes PCRE2 compatible with Perl.
  • In certain circumstances, option settings within patterns were not being correctly processed. For example, the pattern ((?i)A)(?m)B incorrectly matched “ab”. (The (?m) setting lost the fact that (?i) should be reset at the end of its group during the parse process, but without another setting such as (?m) the compile phase got it right.)
  • When serializing a pattern, set the memctl, executable_jit, and tables fields (that is, all the fields that contain pointers) to zeros so that the result of serializing is always the same. These fields are re-set when the pattern is deserialized.
  • In a pattern such as [^\x{100}-\x{ffff}]*[\x80-\xff] which has a repeated negative class with no characters less than 0x100 followed by a positive class with only characters less than 0x100, the first class was incorrectly being auto-possessified, causing incorrect match failures.
  • If the only branch in a conditional subpattern was anchored, the whole subpattern was treated as anchored, when it should not have been, since the assumed empty second branch cannot be anchored. Demonstrated by test patterns such as (?(1)^())b or (?(?=^))b.
  • A repeated conditional subpattern that could match an empty string was always assumed to be unanchored. Now it it checked just like any other repeated conditional subpattern, and can be found to be anchored if the minimum quantifier is one or more.

YuPcre2 1.9.1 – 1 Jan 2019

  • Fix TDIRegEx2_16.MatchNext which might not not have properly advanced the start offset if the previous match was an empty string.
  • In YuPcre2_RegEx2.pas, replace a few character constants with ordinal constants to work around duplicate case label errors with at least one Delphi 10.3 Rio installation.

YuPcre2 1.9.0 – 24 Dec 2018

  • Support Delphi 10.3 Rio Win32 and Win64.

YuPcre2 1.8.0 – 2 Mar 2018

  • Defined public names for all the pcre2_compile error numbers.
  • When an assertion contained (*ACCEPT) it caused all open capturing groups to be closed (as for a non-assertion ACCEPT), which was wrong and could lead to misbehaviour for subsequent references to groups that started outside the assertion. ACCEPT in an assertion now closes only those groups that were started within that assertion.
  • Although pcre2_jit_match checks whether the pattern is compiled in a given mode, it was also expected that at least one mode is available. This is fixed and pcre2_jit_match returns with PCRE2_ERROR_JIT_BADOPTION when the pattern is not optimized by JIT at all.
  • If a backreference with a minimum repeat count of zero was first in a pattern, apart from assertions, an incorrect first matching character could be recorded. For example, for the pattern (?=(a))\1?b, “b” was incorrectly set as the first character of a match.
  • Characters in a leading positive assertion are considered for recording a first character of a match when the rest of the pattern does not provide one. However, a character in a non-assertive group within a leading assertion such as in the pattern (?=(a))\1?b caused this process to fail. This was an infelicity rather than an outright bug, because it did not affect the result of a match, just its speed. (In fact, in this case, the starting 'a' was subsequently picked up in the study.)
  • Allocate a single callout block on the stack at the start of pcre2_match and set its never-changing fields once only. Do the same for pcre2_dfa_match.
  • Save the extra compile options (set in the compile context) with the compiled pattern (they were not previously saved), add PCRE2_INFO_EXTRAOPTIONS to retrieve them.
  • Added PCRE2_CALLOUT_STARTMATCH and PCRE2_CALLOUT_BACKTRACK bits to a new field callout_flags in callout blocks. The bits are set by pcre2_match, but not by JIT or pcre2_dfa_match. These bits are provided to help with tracking how a backtracking match is proceeding.
  • When PCRE2_FIRSTLINE without PCRE2_NO_START_OPTIMIZE was used in non-JIT matching (both pcre2_match and pcre2_dfa_match) and the matched string started with the first code unit of a newline sequence, matching failed because it was not tried at the newline.
  • Code for giving up a non-partial match after failing to find a starting code unit anywhere in the subject was missing when searching for one of a number of code units (the bitmap case) in both pcre2_match and pcre2_dfa_match. This was a missing optimization rather than a bug.
  • The JIT compiler has been updated.
  • Avoid pointer overflow for unset captures in pcre2_substring_list_get. This could not actually cause a crash because it was always used in a memcpy() call with zero length.
  • Auto-possessification at the end of a capturing group was dependent on what follows the group (e.g. (a+)b would auto-possessify the a+) but this caused incorrect behaviour when the group was called recursively from elsewhere in the pattern where something different might follow. Iterators at the ends of capturing groups are no longer considered for auto-possessification if the pattern contains any recursions.

YuPcre2 1.7.0 – 16 Aug 2017

  • Implement PCRE2_ENDANCHORED, coEndAnchored, and moEndAnchored.
  • Add an explicit limit on the amount of heap used by pcre2_match, set by pcre2_set_heap_limit, TDIPerlRegEx2_8.HeapLimit, TDIDfaRegEx2_16.HeapLimit, and the pattern start (*LIMIT_HEAP=xxx).
  • Extend auto-anchoring etc. to ignore groups with a zero qualifier and single-branch conditions with a false condition (e.g. DEFINE) at the start of a branch. For example, (?(DEFINE)…)^A and (…){0}^B are now flagged as anchored.
  • Implement PCRE2_EXTENDED_MORE and coExtendedMore, and related /xx and (?xx) features.
  • Implement (?n: for PCRE2_NO_AUTO_CAPTURE and coNoAutoCapture, because Perl now has this.
  • Implement extra compile options in the compile context:
    • PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES and coAllowSurrogateEscapes;
    • PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL and coBadEscapeIsLiteral;
    • PCRE2_EXTRA_MATCH_LINE and coMatchLine;
    • PCRE2_EXTRA_MATCH_WORD and coMatchWord.
  • Implement newline type PCRE2_NEWLINE_NUL.
  • A lookbehind assertion that had a zero-length branch caused undefined behaviour when processed by pcre2_dfa_match.
  • The match limit value now also applies to pcre2_dfa_match as there are patterns that can use up a lot of resources without necessarily recursing very deeply.
  • Implement PCRE2_LITERAL and coLiteral.
  • Increased the limit for searching for a “must be present” code unit in subjects from 1000 to 2000 for 8-bit searches, since they are much faster.
  • Arrange for anchored patterns to record and use “first code unit” data, because this can give a fast “no match” without searching for a “required code unit”. Previously only non-anchored patterns did this.
  • Upgraded the Unicode tables from Unicode 8.0.0 to Unicode 10.0.0.
  • Update extended grapheme breaking rules to the latest set that are in Unicode Standard Annex #29.
  • Added experimental foreign pattern conversion facilities (pcre2_pattern_convert and friends).
  • If a hyphen that follows a character class is the last character in the class, Perl does not give a warning. PCRE2 now also treats this as a literal.
  • PCRE2 was not throwing an error for [\d-X] (and similar escapes), as is documented.

YuPcre2 1.6.0 – 3 Apr 2017

New features:

  • Support Delphi 10.2 Tokyo Win32 and Win64.
  • The main interpreter, pcre2_match, has been refactored into a new version that does not use recursive function calls (and therefore the stack) for remembering backtracking positions. The new implementation allows backtracking into recursive group calls in patterns, making it more compatible with Perl, and also fixes some other hard-to-do issues.
    • Now that pcre2_match no longer uses recursive function calls (see above), the “match limit recursion” value seems misnamed. It still exists, and limits the depth of tree that is searched. To avoid future confusion, it has been renamed as “depth limit” in all relevant places (TDIRegEx2Base.MatchLimitDepth, PCRE2_INFO_DEPTHLIMIT, PCRE2_CONFIG_DEPTHLIMIT, PCRE2_ERROR_DEPTHLIMIT, pcre2_set_depth_limit, etc.) but the old names are still available for backwards compatibility.
    • PCRE2_CONFIG_STACKRECURSE is no longer used and deprecated.
  • Added the PCRE2_INFO_FRAMESIZE item to pcre2_pattern_info and the InfoFrameSize property to TDIRegEx2_8 as well as TDIRegEx2_16.InfoFrameSize.
  • The depth (formerly recursion) limit now applies to DFA matching.

Bug fixes:

  • In the 32-bit library in non-UTF mode, an attempt to find a Unicode property for a character with a code point greater than 0x10ffff (the Unicode maximum) caused a crash.
  • If a lookbehind assertion that contained a back reference to a group appearing later in the pattern was compiled with the PCRE2_ANCHORED option, undefined actions (often a segmentation fault) could occur, depending on what other options were set. An example assertion is (?<!\1(abc)) where the reference \1 precedes the group (abc).
  • Fix memory leak in pcre2_serialize_decode when the input is invalid.
  • Fix potential nil dereference in pcre2_callout_enumerate if called with a nil pattern pointer.
  • The alternative matching function, pcre2_dfa_match misbehaved if it encountered a character class with a possessive repeat, for example [a-f]{3}+.

YuPcre2 1.5.0 – 17 Feb 2017

New features:

  • Implemented pcre2_code_copy_with_tables.
  • \g{+<number>} (e.g. \g{+2}) is now supported. It is a “forward back reference” and can be useful in repetitions (compare \g{-<number>}). Perl does not recognize this syntax.


  • When a pattern is too complicated, PCRE2 gives up trying to find a minimum matching length and just records zero. Typically this happens when there are too many nested or recursive back references. If the limit was reached in certain recursive cases it failed to be triggered and an internal error could be the result.
  • The pcre2_dfa_match function now takes note of the recursion limit for the internal recursive calls that are used for lookrounds and recursions within the pattern.
  • Detecting patterns that are too large inside the length-measuring loop saves processing ridiculously long patterns to their end.
  • When autopossessifying, skip empty branches without recursion, to reduce stack usage. Example pattern: X?(R||){3335}.
  • A pattern with very many explicit back references to a group that is a long way from the start of the pattern could take a long time to compile because searching for the referenced group in order to find the minimum length was being done repeatedly. Now up to 128 group minimum lengths are cached and the attempt to find a minimum length is abandoned if there is a back reference to a group whose number is greater than 128. (In that case, the pattern is so complicated that this optimization probably isn't worth it.)

Bug fixes:

  • In any wide-character mode (8-bit UTF or any 16-bit or 32-bit mode), without PCRE2_UCP set, a negative character type such as \D in a positive class should cause all characters greater than 255 to match, whatever else is in the class. There was a bug that caused this not to happen if a Unicode property item was added to such a class, for example [\D\P{Nd}] or [\W\pL].
  • There has been a major re-factoring of pcre2_compile. Most syntax checking is now done in the pre-pass that identifies capturing groups. While doing this, some minor bugs and Perl incompatibilities were fixed, including:
    1. \Q\E in the middle of a quantifier such as A+\Q\E+ is now ignored instead of giving an invalid quantifier error.
    2. {0} can now be used after a group in a lookbehind assertion; previously this caused an “assertion is not fixed length” error.
    3. Perl always treats (?(DEFINE) as a “define” group, even if a group with the name “DEFINE” exists. PCRE2 now does likewise.
    4. A recursion condition test such as (?(R2)…) must now refer to an existing subpattern.
    5. A conditional recursion test such as (?(R)…) misbehaved if there was a group whose name began with “R”.
    6. A hyphen appearing immediately after a POSIX character class (for example [[:ascii:]-z]) now generates an error. Perl does accept this as a literal, but gives a warning, so it seems best to fail it in PCRE.
    7. An empty \Q\E sequence may appear after a callout that precedes an assertion condition (it is, of course, ignored).

      One effect of the refactoring is that some error numbers and messages have changed, and the pattern offset given for compiling errors is not always the right-most character that has been read. In particular, for a variable-length lookbehind assertion it now points to the start of the assertion. Another change is that when a callout appears before a group, the “length of next pattern item” that is passed now just gives the length of the opening parenthesis item, not the length of the whole group. A length of zero is now given only for a callout at the end of the pattern. Automatic callouts are no longer inserted before and after explicit callouts in the pattern. * Back references are now permitted in lookbehind assertions when there are no duplicated group numbers (that is, (?| has not been used), and, if the reference is by name, there is only one group of that name. The referenced group must, of course be of fixed length.
  • Automatic callouts are no longer generated before and after callouts in the pattern.
  • A number of bugs have been mended relating to match start-up optimizations when the first thing in a pattern is a positive lookahead. These all applied only when PCRE2_NO_START_OPTIMIZE was *not* set:
    1. A pattern such as (?=.*X)X$ was incorrectly optimized as if it needed both an initial 'X' and a following 'X'.
    2. Some patterns starting with an assertion that started with .* were incorrectly optimized as having to match at the start of the subject or after a newline. There are cases where this is not true, for example, (?=.*[A-Z])(?=.{8,16})(?!.*[\s]) matches after the start in lines that start with spaces. Starting .* in an assertion is no longer taken as an indication of matching at the start (or after a newline).
  • A pattern with PCRE2_DOTALL (/s) set but not PCRE2_NO_DOTSTAR_ANCHOR, and which started with .* inside a positive lookahead was incorrectly being compiled as implicitly anchored.
  • Fix out-of-bounds read for partial matching of . against an empty string when the newline type is CRLF.
  • The appearance of \p, \P, or \X in a substitution string when PCRE2_SUBSTITUTE_EXTENDED was set caused a segmentation fault (nil dereference).
  • If the starting offset was specified as greater than the subject length in a call to pcre2_substitute an out-of-bounds memory reference could occur.
  • Incorrect data was compiled for a pattern with PCRE2_UCP set without PCRE2_UTF if a class required all wide characters to match (for example, [\s[:^ascii:]]).
  • The limit in the auto-possessification code that was intended to catch overly-complicated patterns and not spend too much time auto-possessifying was being reset too often, resulting in very long compile times for some patterns. Now such patterns are no longer completely auto-possessified.
  • Ignore PCRE2_CASELESS when processing \h, \H, \v, and \V in classes as it just wastes time. In the UTF case it can also produce redundant entries in XCLASS lists caused by characters with multiple other cases and pairs of characters in the same “not-x” sublists.

YuPcre2 1.4.0 – 31 Jul 2016

New Features:

  • Implemented pcre2_code_copy to make a copy of a compiled pattern.
  • Implemented the PCRE2_NO_JIT option for pcre2_match and moNoJit option for TDIRegEx2Base.MatchOptions.
  • Calls to pcre2_get_error_message with error numbers that are never returned by PCRE2 functions were returning empty strings. Now the error code PCRE2_ERROR_BADDATA is returned.
  • Allow \C in lookbehinds and DFA matching in UTF-32 mode.

Bug fixes:

  • Detect unmatched closing parentheses and give the error in the pre-scan instead of later. Previously the pre-scan carried on and could give a misleading incorrect error message. For example, (?J)(?'a'))(?'a') gave a message about invalid duplicate group names.
  • A pattern that included (*ACCEPT) in the middle of a sufficiently deeply nested set of parentheses of sufficient size caused an overflow of the compiling workspace (which was diagnosed, but of course is not desirable).
  • Detect missing closing parentheses during the pre-pass for group identification.
  • Fix a racing condition in JIT.
  • Fix register overwrite in JIT when SSE2 acceleration is enabled.

YuPcre2 1.3.0 – 7 May 2016

  • Support Delphi 10.1 Berlin Win32 and Win64.

YuPcre2 1.2.0 – 4 Mar 2016

New features:

  • New option to limit the length of a pattern: TDIRegEx2Base.MaxPatternLength and pcre2_set_max_pattern_length.
  • New option to limit the offset of unanchored matches: TDIRegEx2Base.OffsetLimit and pcre2_set_offset_limit.

Bug fixes:

  • In a character class such as [\W\p{Any}] where both a negative-type escape (“not a word character”) and a property escape were present, the property escape was being ignored.
  • Fixed integer overflow for patterns whose minimum matching length is very, very large.
  • The special sequences [[:<:]] and [[:>:]] gave rise to incorrect compiling errors or other strange effects if compiled in UCP mode.
  • Adding group information caching improves the speed of compiling when checking whether a group has a fixed length and/or could match an empty string, especially when recursion or subroutine calls are involved.
  • If [:^ascii:] or [:^xdigit:] are present in a non-negated class, all characters with code points greater than 255 are in the class. When a Unicode property was also in the class (if PCRE2_UCP is set, escapes such as \w are turned into Unicode properties), wide characters were not correctly handled, and could fail to match. Negated classes such as [^[:^ascii:]\d] were also not working correctly in UCP mode.
  • If PCRE2_AUTO_CALLOUT was set on a pattern that had a (?# comment between an item and its qualifier (for example, A(?#comment)?B) pcre2_compile misbehaved.
  • Similarly, if an isolated \E was present between an item and its qualifier when PCRE2_AUTO_CALLOUT was set, pcre2_compile misbehaved.
  • The error for an invalid UTF pattern string always gave the code unit offset as zero instead of where the invalidity was found.
  • An empty \Q\E sequence between an item and its qualifier caused pcre2_compile to misbehave when auto callouts were enabled.
  • If both PCRE2_ALT_VERBNAMES and PCRE2_EXTENDED were set, and a (*MARK) or other verb “name” ended with whitespace immediately before the closing parenthesis, pcre2_compile misbehaved. Example: (*:abc ), but only when both those options were set.
  • In a number of places pcre2_compile was not handling nil characters correctly.
  • If a pattern that was compiled with PCRE2_EXTENDED started with white space or a #-type comment that was followed by (?-x), which turns off PCRE2_EXTENDED, and there was no subsequent (?x) to turn it on again, pcre2_compile assumed that (?-x) applied to the whole pattern and consequently mis-compiled it. The fix for this bug means that a setting of any of the (?imsxU) options at the start of a pattern is no longer transferred to the options that are returned by PCRE2_INFO_ALLOPTIONS. In fact, this was an anachronism that should have changed when the effects of those options were all moved to compile time.
  • An escaped closing parenthesis in the “name” part of a (*verb) when PCRE2_ALT_VERBNAMES was set caused pcre2_compile to malfunction.

YuPcre2 1.1.0 – 15 Sep 2015

  • Support Delphi 10 Seattle Win32 and Win64.
  • Match limit check added to recursion.
  • Arrange for the UTF check in pcre2_match and pcre2_dfa_match to look only at the part of the subject that is relevant when the starting offset is non-zero.
  • Improve first character match in JIT with SSE2 on x86.
  • Fixed two assertion fails in JIT.
  • Fixed a corner case of range optimization in JIT.
  • Add the ${*MARK} facility to pcre2_substitute.
  • Implemented PCRE2_ALT_VERBNAMES and coAltVerbnames.
  • Fixed two issues in JIT.

YuPcre2 1.0.1 – 8 Aug 2015

  • Pathological patterns containing many nested occurrences of [: caused pcre2_compile to run for a very long time.
  • A missing closing parenthesis for a callout with a string argument was not being diagnosed, possibly leading to a buffer overflow.
  • A conditional group with only one branch has an implicit empty alternative branch and must therefore be treated as potentially matching an empty string.
  • If (?R was followed by - or + incorrect behaviour happened instead of a diagnostic.
  • Conditional groups whose condition was an assertion preceded by an explicit callout with a string argument might be incorrectly processed, especially if the string contained \Q.
  • Fix buffer overflow while checking a UTF-8 string if the final multi-byte UTF-8 character was truncated.
  • Finding the minimum matching length of complex patterns with back references and/or recursions can take a long time. There is now a cut-off that gives up trying to find a minimum length when things get too complex.
  • An optimization has been added that speeds up finding the minimum matching length for patterns containing repeated capturing groups or recursions.
  • If a pattern contained a back reference to a group whose number was duplicated as a result of appearing in a (?|…) group, the computation of the minimum matching length gave a wrong result, which could cause incorrect “no match” errors. For such patterns, a minimum matching length cannot at present be computed.
  • Added a check for integer overflow in conditions (?(<digits>) and (?(R<digits>).
  • Fixed an issue when \p{Any} inside an xclass did not read the current character.
  • The JIT compiler did not restore the control verb head in case of *THEN control verbs.
  • The way recursive references such as (?3) are compiled has been re-written because the old way was the cause of many issues. Now, conversion of the group number into a pattern offset does not happen until the pattern has been completely compiled. This does mean that detection of all infinitely looping recursions is postponed till match time. In the past, some easy ones were detected at compile time.
  • A test for a back reference to a non-existent group was missing for items such as \987. This caused incorrect code to be compiled.
  • Error messages for syntax errors following \g and \k were giving inaccurate offsets in the pattern.
  • Improve the performance of starting single character repetitions in JIT.
  • (*LIMIT_MATCH=) now gives an error instead of setting the value to 0.
  • Error messages for syntax errors in *LIMIT_MATCH and *LIMIT_RECURSION now give the right offset instead of zero.
  • The JIT compiler should not check repeats after a {0,1} repeat byte code.
  • The JIT compiler should restore the control chain for empty possessive repeats.

YuPcre2 1.0.0 – 22 Jul 2015

  • Initial release.
products/pcre2/history.txt · Last modified: 2019/10/08 16:55 (external edit)