Delphi Inspiration

Components and Applications

User Tools

Site Tools


products:pcre2:history

YuPcre2: Version History

YuPcre2 is a new regular expression library for Delphi with Perl syntax. It directly supports UnicodeString, AnsiString, or UCS4String, as well as UTF-8, and UTF-16.

YuPcre2 1.4.0 – 31 Jul 2016

New Features:

  • Implemented pcre2_code_copy to make a copy of a compiled pattern.
  • Implemented the PCRE2_NO_JIT option for pcre2_match and moNoJit option for TDIRegEx2Base.MatchOptions.
  • Calls to pcre2_get_error_message with error numbers that are never returned by PCRE2 functions were returning empty strings. Now the error code PCRE2_ERROR_BADDATA is returned.
  • Allow \C in lookbehinds and DFA matching in UTF-32 mode.

Fixes:

  • Detect unmatched closing parentheses and give the error in the pre-scan instead of later. Previously the pre-scan carried on and could give a misleading incorrect error message. For example, (?J)(?'a'))(?'a') gave a message about invalid duplicate group names.
  • A pattern that included (*ACCEPT) in the middle of a sufficiently deeply nested set of parentheses of sufficient size caused an overflow of the compiling workspace (which was diagnosed, but of course is not desirable).
  • Detect missing closing parentheses during the pre-pass for group identification.
  • Fix a racing condition in JIT.
  • Fix register overwite in JIT when SSE2 acceleration is enabled.

YuPcre2 1.3.0 – 7 May 2016

  • Support Delphi 10.1 Berlin Win32 and Win64.

YuPcre2 1.2.0 – 4 Mar 2016

New features:

  • New option to limit the length of a pattern: TDIRegEx2Base.MaxPatternLength and pcre2_set_max_pattern_length.
  • New option to limit the offset of unanchored matches: TDIRegEx2Base.OffsetLimit and pcre2_set_offset_limit.
  • New pcre2_substitute options PCRE2_SUBSTITUTE_EXTENDED, PCRE2_SUBSTITUTE_UNSET_EMPTY, PCRE2_SUBSTITUTE_UNKNOWN_UNSET, and PCRE2_SUBSTITUTE_OVERFLOW_LENGTH.

Bug fixes:

  • In a character class such as [\W\p{Any}] where both a negative-type escape (“not a word character”) and a property escape were present, the property escape was being ignored.
  • Fixed integer overflow for patterns whose minimum matching length is very, very large.
  • The special sequences [[:<:]] and [[:>:]] gave rise to incorrect compiling errors or other strange effects if compiled in UCP mode.
  • Adding group information caching improves the speed of compiling when checking whether a group has a fixed length and/or could match an empty string, especially when recursion or subroutine calls are involved.
  • If [:^ascii:] or [:^xdigit:] are present in a non-negated class, all characters with code points greater than 255 are in the class. When a Unicode property was also in the class (if PCRE2_UCP is set, escapes such as \w are turned into Unicode properties), wide characters were not correctly handled, and could fail to match. Negated classes such as [^[:^ascii:]\d] were also not working correctly in UCP mode.
  • If PCRE2_AUTO_CALLOUT was set on a pattern that had a (?# comment between an item and its qualifier (for example, A(?#comment)?B) pcre2_compile misbehaved.
  • Similarly, if an isolated \E was present between an item and its qualifier when PCRE2_AUTO_CALLOUT was set, pcre2_compile misbehaved.
  • The error for an invalid UTF pattern string always gave the code unit offset as zero instead of where the invalidity was found.
  • An empty \Q\E sequence between an item and its qualifier caused pcre2_compile to misbehave when auto callouts were enabled.
  • If both PCRE2_ALT_VERBNAMES and PCRE2_EXTENDED were set, and a (*MARK) or other verb “name” ended with whitespace immediately before the closing parenthesis, pcre2_compile misbehaved. Example: (*:abc ), but only when both those options were set.
  • In a number of places pcre2_compile was not handling nil characters correctly.
  • If a pattern that was compiled with PCRE2_EXTENDED started with white space or a #-type comment that was followed by (?-x), which turns off PCRE2_EXTENDED, and there was no subsequent (?x) to turn it on again, pcre2_compile assumed that (?-x) applied to the whole pattern and consequently mis-compiled it. The fix for this bug means that a setting of any of the (?imsxU) options at the start of a pattern is no longer transferred to the options that are returned by PCRE2_INFO_ALLOPTIONS. In fact, this was an anachronism that should have changed when the effects of those options were all moved to compile time.
  • An escaped closing parenthesis in the “name” part of a (*verb) when PCRE2_ALT_VERBNAMES was set caused pcre2_compile to malfunction.

YuPcre2 1.1.0 – 15 Sep 2015

  • Support Delphi 10 Seattle Win32 and Win64.
  • Match limit check added to recursion.
  • Arrange for the UTF check in pcre2_match and pcre2_dfa_match to look only at the part of the subject that is relevant when the starting offset is non-zero.
  • Improve first character match in JIT with SSE2 on x86.
  • Fixed two assertion fails in JIT.
  • Fixed a corner case of range optimization in JIT.
  • Add the ${*MARK} facility to pcre2_substitute.
  • Implemented PCRE2_ALT_VERBNAMES and coAltVerbnames.
  • Fixed two issues in JIT.

YuPcre2 1.0.1 – 8 Aug 2015

  • Pathological patterns containing many nested occurrences of [: caused pcre2_compile to run for a very long time.
  • A missing closing parenthesis for a callout with a string argument was not being diagnosed, possibly leading to a buffer overflow.
  • A conditional group with only one branch has an implicit empty alternative branch and must therefore be treated as potentially matching an empty string.
  • If (?R was followed by - or + incorrect behaviour happened instead of a diagnostic.
  • Conditional groups whose condition was an assertion preceded by an explicit callout with a string argument might be incorrectly processed, especially if the string contained \Q.
  • Fix buffer overflow while checking a UTF-8 string if the final multi-byte UTF-8 character was truncated.
  • Finding the minimum matching length of complex patterns with back references and/or recursions can take a long time. There is now a cut-off that gives up trying to find a minimum length when things get too complex.
  • An optimization has been added that speeds up finding the minimum matching length for patterns containing repeated capturing groups or recursions.
  • If a pattern contained a back reference to a group whose number was duplicated as a result of appearing in a (?|…) group, the computation of the minimum matching length gave a wrong result, which could cause incorrect “no match” errors. For such patterns, a minimum matching length cannot at present be computed.
  • Added a check for integer overflow in conditions (?(<digits>) and (?(R<digits>).
  • Fixed an issue when \p{Any} inside an xclass did not read the current character.
  • The JIT compiler did not restore the control verb head in case of *THEN control verbs.
  • The way recursive references such as (?3) are compiled has been re-written because the old way was the cause of many issues. Now, conversion of the group number into a pattern offset does not happen until the pattern has been completely compiled. This does mean that detection of all infinitely looping recursions is postponed till match time. In the past, some easy ones were detected at compile time.
  • A test for a back reference to a non-existent group was missing for items such as \987. This caused incorrect code to be compiled.
  • Error messages for syntax errors following \g and \k were giving inaccurate offsets in the pattern.
  • Improve the performance of starting single character repetitions in JIT.
  • (*LIMIT_MATCH=) now gives an error instead of setting the value to 0.
  • Error messages for syntax errors in *LIMIT_MATCH and *LIMIT_RECURSION now give the right offset instead of zero.
  • The JIT compiler should not check repeats after a {0,1} repeat byte code.
  • The JIT compiler should restore the control chain for empty possessive repeats.

YuPcre2 1.0.0 – 22 Jul 2015

  • Initial release.
products/pcre2/history.txt · Last modified: 2016/07/31 22:34 (external edit)