Yunqa • The Delphi Inspiration

Delphi Components and Applications

User Tools

Site Tools


products:converters:index
no way to compare when less than two revisions

Differences

This shows you the differences between two versions of the page.


products:converters:index [2022/02/04 16:57] (current) – created - external edit 127.0.0.1
Line 1: Line 1:
 +====== DIConverters ======
  
 +{{page>header}}
 +
 +===== Overview =====
 +
 +DIConverters supplies [[encodings|144 character set encodings]] with two complementary functions each, adding up to a total of more than 288 character conversion functions:
 +
 +  - to decode from encoding to Unicode.
 +  - to encode from Unicode to encoding.
 +
 +All conversion are fully native and require no DLL or system dependencies. Applications build with DIConverters therefore run on all Win32 platforms starting from (and including!) Windows 95.
 +
 +The converter functions allow for smart-linking: Only those functions used by the application are actually included into the executable. This keeps applications small when only one or a few character conversions are needed.
 +
 +Click [[encodings|here]] for a listing character sets and encodings supported by DIConverters.
 +
 +===== Using DIConverters =====
 +
 +All conversions take place on a Unicode character base. In multi-byte character encodings, a single Unicode character is represented by one or more bytes.
 +
 +<WRAP tip>
 +DIConverters can be used by [[:products:unicode:|DIUnicode]], which contains comfortable classes with automatic character conversion for both reading and writing Unicode text. With [[:products:unicode:|DIUnicode]], all text operations take place on a WideChar / WideString basis regardless of the actual text encoding. This allows applications to use the very same import / export routine on all 144 character sets and encodings.
 +</WRAP>
 +
 +===== Conversion Preparations ====
 +
 +Functions for both direct decoding and encoding require a conversion state variable of type conv_t, which is a record structure defined in DIConverters.pas. Before actually starting a direct character coding, this variable must be initialized with zeros. Applications can easily accomplished this with the following standard Pascal call:
 +
 +<code delphi>
 +var
 +  conv: conv_struct;
 +begin
 +  FillChar(conv, SizeOf(conv), 0);
 +</code>
 +
 +You can then proceed using the decoding and encoding functions described below.
 +
 +===== Reading with Unicode Decoding =====
 +
 +The function prototype to decode multi-byte encodings to Unicode is:
 +
 +<code delphi>
 +xxx_mbtowc = function(
 +  const conv: conv_t;
 +  var pwc: ucs4_t;
 +  const s: Pointer;
 +  const n: Integer): Integer;
 +</code>
 +
 +The xxx stands for the actual character encoding, like utf8_mbtowc.
 +
 +It converts the byte sequence starting at s to a Unicode code point. Up to n bytes must be available at s, and n >= 1. The Unicode representation is stored in pwc.
 +
 +The function's return value indicates if the conversion was successful:
 +
 +  * **number of bytes consumed:** Success, a wide character was read.
 +  * **-1:** The byte sequence at s is invalid.
 +  * **-2:** The number of bytes n is too small.
 +  * **-2-(number of bytes consumed):** Only a shift sequence was read.
 +
 +A few encodings may require xxx_mbtowc to be combined with xxx_flushwc:
 +
 +<code delphi>
 +xxx_flushwc = function(
 +  const conv: conv_t;
 +  var pwc: ucs4_t): Integer;
 +</code>
 +
 +xxx_flushwc returns to the initial state and stores the pending wide character, if any. The result is 1 (if a wide character was read) or 0 if none was pending.
 +
 +Calling xxx_flushwc is not required for most encodings.
 +
 +
 +===== Writing with Unicode Encoding =====
 +
 +The function prototype to encode a Unicode code point to multi-byte is:
 +
 +<code delphi>
 +xxx_wctomb = function(
 +  const conv: conv_t;
 +  const r: Pointer;
 +  const wc: ucs4_t;
 +  const n: Integer): Integer;
 +</code>
 +
 +The xxx stands for the actual character encoding, like utf8_mbtowc.
 +
 +The function converts the wide character wc to the character set xxx, and stores the result beginning at r. Up to n bytes may be written at r. n is >= 1.
 +
 +The function's result is the number of bytes written, or -1 if invalid, or -2 if n is too small.
 +
 +To write any pending characters and return to the original state, a call to xxx_reset may be required for some encodings:
 +
 +<code delphi>
 +xxx_reset = function(
 +  const conv: conv_t;
 +  const r: Pointer;
 +  const n: Integer): Integer;
 +</code>
 +
 +It stores a shift sequences returning to the initial state beginning at r. Up to n bytes may be written at r. n is >= 0. It returns the number of bytes written, or -2 if n is too small.
 +
 +{{tag>Character "Character Sets" "Character Encodings" Converter Freeware Unicode UTF}}
products/converters/index.txt · Last modified: 2022/02/04 16:57 by 127.0.0.1