- DIZipWriter 5.0.0 New bzip2 and LZMA compressors, Zip64 support, UTF-8 Unicode entry names. Support Delphi XE4 Win32 & Win64.
|
|
Table of Contents
YuStemmerYuStemmer is a natural language stemming library for Delphi (Embarcadero / CodeGear / Borland). Its purpose is to reduce an inflected word to a common stem or root form. The English stemmer, for example, returns "write" for "write", "writes", "writing", and "writings". Stemmers are available for these languages:
Applications for stemmers are usually query and search systems. Stemming enables them to return related results with similar meaning but slightly different spelling. YuStemmer was initially developed for the DISQLite3 Full Text Search (FTS) engine, but fits other purposes we well. YuStemmer is fully algorithmic. No extensive lookup dictionaries are needed. This results in small memory footprint and leads to excellent performance. YuStemmer is organized into different classes, each of them optimized for a particular string type and text encoding:
Make sure to choose the stemmer class matching your string type and character set. Otherwise you will suffer a performance penalty caused by avoidable string conversions. In Delphi, such conversions usually happen implicitly and go unnoticed by most developers. Therefore, pay close attention here to make the most of YuStemmer! Example – stem a single wordThe Stem() method does the work for all of the above classes. It expects a single word and returns its stem. If there is no stem, the original word is returned unchanged. function StemFrench(const AWord: AnsiString): AnsiString; var Stemmer: TYuStemmer; begin Stemmer := TYuStemmer_French.Create; Result := Stemmer.Stem(AWord); Stemmer.Free; end; Example – stem multiple words in a TStringListTo improve performance when stemming a great number of words, it is safe to reuse the same instance of a stemmer class multiple times. procedure StemItalian(const AWords: TStringList); var i: Integer; { TStringList is UTF-16 in Unicode Delphis. } Stemmer: {$IFDEF Unicode}TYuStemmer_16{$ELSE}TYuStemmer{$ENDIF}; begin {$IFDEF Unicode} Stemmer := TYuStemmer_Italian_16.Create; {$ELSE Unicode} Stemmer := TYuStemmer_Italian.Create; {$ENDIF Unicode} for i := 0 to AWords.Count - 1 do AWords[i] := Stemmer.Stem(AWords[i]); Stemmer.Free; end; products/stemmer/index.txt · Last modified: 2012/07/23 11:39 (external edit)
|
| Copyright (c) 2000-2011 Ralf Junker – http://www.yunqa.de/delphi/ – Disclaimer – Haftungsausschluss – Impressum |