Yunqa • The Delphi Inspiration

Delphi Components and Applications

User Tools

Site Tools


wiki:regex:index
no way to compare when less than two revisions

Differences

This shows you the differences between two versions of the page.


wiki:regex:index [2016/01/22 15:09] (current) – created - external edit 127.0.0.1
Line 1: Line 1:
 +====== DIRegEx: Wiki ======
  
 +{{page>products:regex:header}}
 +{{page>:wiki-header}}
 +
 +===== Stack Overflow? =====
 +
 +**Question: Apparently, specific patterns with specific subjects can drive DIRegEx into stack overflow errors. How can I avoid these?**
 +
 +Answer: DIRegEx uses a recursive matching algorithm which can run out of stack space with //extremely// demanding patterns. The recursive algorithm was still chosen over an iterative implementation for performance reasons: Extensive testing revealed that DIRegEx runs multiple times faster with recursion than without.
 +
 +Even though stack overflows are a real problem, they happens so rarely with common regex patterns and subjects that most DIRegEx users will never notice. In case you ever do, these steps can help to avoid them. Obviously, all 3 options combined yield best results:
 +
 +  - Increase your application's stack size. Adding ''{$MAXSTACKSIZE $00400000}'', preferably in your *.dpr file, should stop overflows even for extremely demanding patterns. ''{$MAXSTACKSIZE $00200000}'' enables most normally demanding patterns to run well and is a reasonable precaution setting.
 +  - Lower the TDIRegEx.MatchLimit and TDIRegEx.MatchLimitRecursion option properties. They cause matching to abort with PCRE_ERROR_MATCHLIMIT or PCRE_ERROR_RECURSIONLIMIT if the respective thresholds have been reached. The values can be set via TDIRegEx or the Extra field using the native API.
 +  - Optimize your regular expression to avoid nested subpatterns with unlimited repeats. The section "Atomic Grouping and Possessive Quantifiers" in the DIRegEx "Syntax Details" help page contains details and examples. The techniques described there will also help your patterns to run faster!
 +
 +In Windows, the stack size is defined on a per-thread basis when the thread is created. This means that the calling thread's stack size applies to DIRegEx even if it is compiled into an external *.bpl or *.dll link library. It defaults to the application's stack size when called from the main thread.
 +
 +If the calling application's stack size is too small and you are unable to change its stack size because you are writing a plugin or extension for a larger application, you might want to run DIRegEx in a newly created thread. The Windows CreateThread() function's dwStackSize parameter allows to change the initially committed stack space, which you should choose according to your needs.
 +
 +Unfortunately, it is not possible to predict the required stack size in advance. It is highly dependent on the number of potential matches in the subject text. A pattern which works well with larger text can still fail with shorter ones if it encounters lots of failed matches which must be backtracked.
 +
 +===== TDIRegExSearchStream explained =====
 +
 +[[tdiregexstreamsearch_de|This page]] contains an interesting e-mail conversion about the internals of TDIRegExSearchStream and descendent classes (German / Deutsch).
 +
 +===== Sample RegEx search function =====
 +
 +This is sample function which shows if string contains a given regex or not.
 +
 +<code delphi>
 +function RegExMatch(const Str, Re: string; ACaseSens: Boolean): Boolean;
 +var
 +  RegEx: TDIRegEx;
 +begin
 +  Result := False;
 +  if (Str = '') or (Re = '') then Exit;
 +  
 +  RegEx := TDIPerlRegEx.Create(nil);
 +  try
 +    //DIRegEx_Api.set_locale(LANG_RUSSIAN);
 +    //RegEx.Options := RegEx.Options + [poUserLocale];
 +  
 +    if ACaseSens then
 +      RegEx.CompileOptions := RegEx.CompileOptions - [coCaseLess]
 +    else
 +      RegEx.CompileOptions := RegEx.CompileOptions + [coCaseLess];
 +    RegEx.SetSubjectStr(Str);
 +    RegEx.MatchPattern := Re;
 +    Result := RegEx.Match(0) >= 0;
 +  finally
 +    RegEx.Free;
 +  end;
 +end;
 +</code>
 +
 +
 +===== Sample RegEx replace function =====
 +
 +This is sample function which replaces a regex ASearch with regex AReplace in string AValue.
 +
 +<code delphi>
 +function RegExReplace(
 +  const AValue: AnsiString;
 +  const ASearch: AnsiString;
 +  const AReplace: AnsiString;
 +  const AOptions: TDIRegexCompileOptions = [coCaseLess]): AnsiString;
 +var
 +  RE: TDIPerlRegEx;
 +begin
 +  RE := TDIPerlRegEx.Create(nil);
 +  try
 +    RE.SetSubjectStr(AValue);
 +    RE.CompileOptions := AOptions;
 +    RE.CompileMatchPatternStr(ASearch);
 +    RE.FormatPattern := AReplace;
 +    if RE.Replace2(Result) = 0 then
 +      Result := AValue;
 +  finally
 +    RE.Free;
 +  end;
 +end;
 +</code>
 +
 +===== Sample RegEx filling with a char =====
 +
 +This function returns original string, where all occurances of regex are filled with a char.
 +(e.g. "\w{3,7}" can match "wwwww" - replaced with ".....")
 +
 +<code delphi>
 +function RegExReplaceToChar(
 +  const Str, Re: string; 
 +  ch: Char; ACaseSens: Boolean): string;
 +var
 +  RegEx: TDIRegEx;
 +  N_prev, N, i: Integer;
 +begin
 +  Result := Str;
 +  if (Str = '') or (Re = '') then Exit;
 +      
 +  RegEx := TDIPerlRegEx.Create(nil);
 +  try
 +    if ACaseSens then
 +      RegEx.CompileOptions := RegEx.CompileOptions - [coCaseLess]
 +    else
 +      RegEx.CompileOptions := RegEx.CompileOptions + [coCaseLess];
 +    RegEx.MatchPattern := Re;
 +    N_prev := -1;
 +    repeat
 +      RegEx.SetSubjectStr(Result);
 +      if RegEx.Match(0) < 0 then Break;
 +      N := RegEx.MatchedStrFirstCharPos + 1;
 +      if N = N_prev then Break;
 +      N_prev := N;
 +      for i := N to (N + RegEx.MatchedStrLength - 1) do
 +        Result[i] := ch;
 +    until False;
 +  finally
 +    RegEx.Free;
 +  end;
 +end;
 +</code>
wiki/regex/index.txt · Last modified: 2016/01/22 15:09 by 127.0.0.1