Yunqa • The Delphi Inspiration

Delphi Components and Applications

User Tools

Site Tools


products:unicode:index
no way to compare when less than two revisions

Differences

This shows you the differences between two versions of the page.


products:unicode:index [2016/01/22 15:08] (current) – created - external edit 127.0.0.1
Line 1: Line 1:
 +====== DIUnicode ======
  
 +{{page>header}}
 +
 +===== Overview =====
 +
 +DIUnicode's Pascal implementation features more than 70 encodings, like UTF-7, UTF-8, UTF-16, the ISO-8859 family, various Windows and Macintosh codepages, KOI8 character sets, Chinese GB18030, and more. Adding a new character coding is as simple as writing a single conversion procedure. It supports 144 character sets and encodings when linked against [[products:converters:|DIConverters]].
 +
 +===== Key Benefits =====
 +
 +DIUnicode is for you if your application needs to handle text with multiple character encodings with high performance and little development time.
 +
 +Both the Unicode Reader and the Unicode Writer work with strings, buffers, and streams. You can, for example, directly read from or write to database BLOB streams avoiding all temporary storage of your data.
 +
 +An efficient buffering system guarantees excellent performance, even when processing huge files.
 +
 +===== Simple Usage Examples =====
 +
 +DIUnicode makes reading and writing Unicode as simple as ASCII text, regardless of the character set or encoding you are processing. the code snippets below show some of the techniques usually applied with TDIUnicodeReader, the reader class of DIUnicode. Remember that you can use the parsing routine unchanged with any of the available encodings.
 +
 +**Read entire lines from a Unicode text file:**
 +
 +<code pascal>
 +{ Setup and initialize. }
 +Reader := TDIUnicodeReader.Create(nil);
 +{ Let's say we want to read UTF-8.
 +  This could well be any other
 +  character encoding. }
 +Reader.ReadMethods := Read_Utf_8;
 +Reader.SourceStream :=
 +  TFileStream.Create('MyFile.txt', fmOpenRead);
 +{ Now the actual reading: }
 +while Reader.ReadLine do
 +  begin
 +    TheLine := Reader.DataAsStrW;
 +    { Your code to process the line
 +      goes here. }
 +  end;
 +</code>
 +
 +**Read individual characters only:**
 +
 +<code pascal>
 +while Reader.ReadChar do
 +  begin
 +    TheChar := Reader.Char;
 +    case TheChar of
 +      'A'..'Z':
 +        ; // Process Alphas
 +      '0'..'9':
 +        ; // Process Digits
 +    end;
 +  end;
 +</code>
 +
 +**Use overloaded methods to read up to a particular character or a set of characters:**
 +
 +<code pascal>
 +{ Read all characters up to the Dollar sign. }
 +Reader.ReadCharsTill('$');
 +{ Read all characters up to either '(' or ')'. }
 +Reader.ReadCharsTill('(', ')');
 +{ Skip rest of line and advance to next one. }
 +Reader.SkipLine;
 +</code>
 +
 +**Advanced parsing:**
 +
 +  * An RFC compliant CSV Parser is part of DIUnicode. Source code is available as a feature demonstration.
 +
 +  * The popular [[products:htmlparser:|DIHtmlParser]] is build on top of DIUnicode. It implements a full featured HTML, XHTML and XML parser with Unicode support and a flexible plugin architecture.
 +
 +===== Peek Ahead / Look Ahead reading =====
 +
 +Unlike other text readers, the lookahead features of TDIUnicodeReader are not limited to a fixed number of characters but by available memory only. The code below reads up to five Unicode characters into the internal buffer. TDIUnicodeReader could well look ahead much further, but this should not be abused and the number kept reasonably small.
 +
 +<code pascal>
 +var
 + UR: TDIUnicodeReader;
 + c: WideChar;
 +begin
 +  { ... TDIUnicodeReader creation
 +        and initialization should go here ... }
 +  UR.PeekAhead(5); // Read up to 5 characters to internal buffer.
 +  if UR.PeekedCount >= 1 then // Test if 1st peekd character could be read ...
 +    c := TDIUnicodeReader.PeekedChars[0]; // and examine it.
 +  if UR.PeekedCount >= 5 then  // Same as above ...
 +    c := TDIUnicodeReader.PeekedChars[4]; // but with 5th peeked chararcter now.
 +  c := UR.ReadChar; // Continue reading with next char.
 +</code>
 +
 +===== Performance =====
 +
 +DIUnicode is extremely fast, even when processing very large files. Both the reader and the writer classes benefit from their internal buffers which allows them to read and write files in small chunks of data, one at a time only. DIUnicode will never require you to fit the entire file into memory. This way it achieves conversion rates of far over 20 MB per second.
 +
 +{{tag>Converter Reader Unicode UTF Writer}}
products/unicode/index.txt · Last modified: 2016/01/22 15:08 by 127.0.0.1