Delphi Inspiration

Components and Applications

User Tools

Site Tools


wiki:regex:index

DIRegEx: Wiki

DIRegEx is a library of components and procedures that implement regular expression pattern matching using the same syntax and semantics as Perl for Delphi (Embarcadero / CodeGear / Borland).

Please register and / or log in to edit. Anonymous Wiki edits are disabled to protect against vandalism.

Stack Overflow?

Question: Apparently, specific patterns with specific subjects can drive DIRegEx into stack overflow errors. How can I avoid these?

Answer: DIRegEx uses a recursive matching algorithm which can run out of stack space with extremely demanding patterns. The recursive algorithm was still chosen over an iterative implementation for performance reasons: Extensive testing revealed that DIRegEx runs multiple times faster with recursion than without.

Even though stack overflows are a real problem, they happens so rarely with common regex patterns and subjects that most DIRegEx users will never notice. In case you ever do, these steps can help to avoid them. Obviously, all 3 options combined yield best results:

  1. Increase your application's stack size. Adding {$MAXSTACKSIZE $00400000}, preferably in your *.dpr file, should stop overflows even for extremely demanding patterns. {$MAXSTACKSIZE $00200000} enables most normally demanding patterns to run well and is a reasonable precaution setting.
  2. Lower the TDIRegEx.MatchLimit and TDIRegEx.MatchLimitRecursion option properties. They cause matching to abort with PCRE_ERROR_MATCHLIMIT or PCRE_ERROR_RECURSIONLIMIT if the respective thresholds have been reached. The values can be set via TDIRegEx or the Extra field using the native API.
  3. Optimize your regular expression to avoid nested subpatterns with unlimited repeats. The section “Atomic Grouping and Possessive Quantifiers” in the DIRegEx “Syntax Details” help page contains details and examples. The techniques described there will also help your patterns to run faster!

In Windows, the stack size is defined on a per-thread basis when the thread is created. This means that the calling thread's stack size applies to DIRegEx even if it is compiled into an external *.bpl or *.dll link library. It defaults to the application's stack size when called from the main thread.

If the calling application's stack size is too small and you are unable to change its stack size because you are writing a plugin or extension for a larger application, you might want to run DIRegEx in a newly created thread. The Windows CreateThread() function's dwStackSize parameter allows to change the initially committed stack space, which you should choose according to your needs.

Unfortunately, it is not possible to predict the required stack size in advance. It is highly dependent on the number of potential matches in the subject text. A pattern which works well with larger text can still fail with shorter ones if it encounters lots of failed matches which must be backtracked.

TDIRegExSearchStream explained

This page contains an interesting e-mail conversion about the internals of TDIRegExSearchStream and descendent classes (German / Deutsch).

Sample RegEx search function

This is sample function which shows if string contains a given regex or not.

function RegExMatch(const Str, Re: string; ACaseSens: Boolean): Boolean;
var
  RegEx: TDIRegEx;
begin
  Result := False;
  if (Str = '') or (Re = '') then Exit;
 
  RegEx := TDIPerlRegEx.Create(nil);
  try
    //DIRegEx_Api.set_locale(LANG_RUSSIAN);
    //RegEx.Options := RegEx.Options + [poUserLocale];
 
    if ACaseSens then
      RegEx.CompileOptions := RegEx.CompileOptions - [coCaseLess]
    else
      RegEx.CompileOptions := RegEx.CompileOptions + [coCaseLess];
    RegEx.SetSubjectStr(Str);
    RegEx.MatchPattern := Re;
    Result := RegEx.Match(0) >= 0;
  finally
    RegEx.Free;
  end;
end;

Sample RegEx replace function

This is sample function which replaces a regex ASearch with regex AReplace in string AValue.

function RegExReplace(
  const AValue: AnsiString;
  const ASearch: AnsiString;
  const AReplace: AnsiString;
  const AOptions: TDIRegexCompileOptions = [coCaseLess]): AnsiString;
var
  RE: TDIPerlRegEx;
begin
  RE := TDIPerlRegEx.Create(nil);
  try
    RE.SetSubjectStr(AValue);
    RE.CompileOptions := AOptions;
    RE.CompileMatchPatternStr(ASearch);
    RE.FormatPattern := AReplace;
    if RE.Replace2(Result) = 0 then
      Result := AValue;
  finally
    RE.Free;
  end;
end;

Sample RegEx filling with a char

This function returns original string, where all occurances of regex are filled with a char. (e.g. “\w{3,7}” can match “wwwww” - replaced with “…..”)

function RegExReplaceToChar(
  const Str, Re: string; 
  ch: Char; ACaseSens: Boolean): string;
var
  RegEx: TDIRegEx;
  N_prev, N, i: Integer;
begin
  Result := Str;
  if (Str = '') or (Re = '') then Exit;
 
  RegEx := TDIPerlRegEx.Create(nil);
  try
    if ACaseSens then
      RegEx.CompileOptions := RegEx.CompileOptions - [coCaseLess]
    else
      RegEx.CompileOptions := RegEx.CompileOptions + [coCaseLess];
    RegEx.MatchPattern := Re;
    N_prev := -1;
    repeat
      RegEx.SetSubjectStr(Result);
      if RegEx.Match(0) < 0 then Break;
      N := RegEx.MatchedStrFirstCharPos + 1;
      if N = N_prev then Break;
      N_prev := N;
      for i := N to (N + RegEx.MatchedStrLength - 1) do
        Result[i] := ch;
    until False;
  finally
    RegEx.Free;
  end;
end;
wiki/regex/index.txt · Last modified: 2016/01/22 15:09 (external edit)