Delphi Inspiration

Components and Applications

User Tools

Site Tools


products:converters:index

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

products:converters:index [2016/01/22 15:07] (current)
Line 1: Line 1:
 +====== DIConverters ======
  
 +{{page>​header}}
 +
 +===== Overview =====
 +
 +DIConverters supplies [[encodings|144 character set encodings]] with two complementary functions each, adding up to a total of more than 288 character conversion functions:
 +
 +  - to decode from encoding to Unicode.
 +  - to encode from Unicode to encoding.
 +
 +All conversion are fully native and require no DLL or system dependencies. Applications build with DIConverters therefore run on all Win32 platforms starting from (and including!) Windows 95.
 +
 +The converter functions allow for smart-linking:​ Only those functions used by the application are actually included into the executable. This keeps applications small when only one or a few character conversions are needed.
 +
 +Click [[encodings|here]] for a listing character sets and encodings supported by DIConverters.
 +
 +===== Using DIConverters =====
 +
 +All conversions take place on a Unicode character base. In multi-byte character encodings, a single Unicode character is represented by one or more bytes.
 +
 +<WRAP tip>
 +DIConverters can be used by [[:​products:​unicode:​|DIUnicode]],​ which contains comfortable classes with automatic character conversion for both reading and writing Unicode text. With [[:​products:​unicode:​|DIUnicode]],​ all text operations take place on a WideChar / WideString basis regardless of the actual text encoding. This allows applications to use the very same import / export routine on all 144 character sets and encodings.
 +</​WRAP>​
 +
 +===== Conversion Preparations ====
 +
 +Functions for both direct decoding and encoding require a conversion state variable of type conv_t, which is a record structure defined in DIConverters.pas. Before actually starting a direct character coding, this variable must be initialized with zeros. Applications can easily accomplished this with the following standard Pascal call:
 +
 +<code delphi>
 +var
 +  conv: conv_struct;​
 +begin
 +  FillChar(conv,​ SizeOf(conv),​ 0);
 +</​code>​
 +
 +You can then proceed using the decoding and encoding functions described below.
 +
 +===== Reading with Unicode Decoding =====
 +
 +The function prototype to decode multi-byte encodings to Unicode is:
 +
 +<code delphi>
 +xxx_mbtowc = function(
 +  const conv: conv_t;
 +  var pwc: ucs4_t;
 +  const s: Pointer;
 +  const n: Integer): Integer;
 +</​code>​
 +
 +The xxx stands for the actual character encoding, like utf8_mbtowc.
 +
 +It converts the byte sequence starting at s to a Unicode code point. Up to n bytes must be available at s, and n >= 1. The Unicode representation is stored in pwc.
 +
 +The function'​s return value indicates if the conversion was successful:
 +
 +  * **number of bytes consumed:** Success, a wide character was read.
 +  * **-1:** The byte sequence at s is invalid.
 +  * **-2:** The number of bytes n is too small.
 +  * **-2-(number of bytes consumed):​** Only a shift sequence was read.
 +
 +A few encodings may require xxx_mbtowc to be combined with xxx_flushwc:​
 +
 +<code delphi>
 +xxx_flushwc = function(
 +  const conv: conv_t;
 +  var pwc: ucs4_t): Integer;
 +</​code>​
 +
 +xxx_flushwc returns to the initial state and stores the pending wide character, if any. The result is 1 (if a wide character was read) or 0 if none was pending.
 +
 +Calling xxx_flushwc is not required for most encodings.
 +
 +
 +===== Writing with Unicode Encoding =====
 +
 +The function prototype to encode a Unicode code point to multi-byte is:
 +
 +<code delphi>
 +xxx_wctomb = function(
 +  const conv: conv_t;
 +  const r: Pointer;
 +  const wc: ucs4_t;
 +  const n: Integer): Integer;
 +</​code>​
 +
 +The xxx stands for the actual character encoding, like utf8_mbtowc.
 +
 +The function converts the wide character wc to the character set xxx, and stores the result beginning at r. Up to n bytes may be written at r. n is >= 1.
 +
 +The function'​s result is the number of bytes written, or -1 if invalid, or -2 if n is too small.
 +
 +To write any pending characters and return to the original state, a call to xxx_reset may be required for some encodings:
 +
 +<code delphi>
 +xxx_reset = function(
 +  const conv: conv_t;
 +  const r: Pointer;
 +  const n: Integer): Integer;
 +</​code>​
 +
 +It stores a shift sequences returning to the initial state beginning at r. Up to n bytes may be written at r. n is >= 0. It returns the number of bytes written, or -2 if n is too small.
 +
 +{{tag>​Character "​Character Sets" "​Character Encodings"​ Converter Unicode UTF}}
products/converters/index.txt · Last modified: 2016/01/22 15:07 (external edit)