Delphi 12 Athens Updates Available!
To download, click your product: DIContainers, DIConverters, DICreole, DIFileFinder, DIGoogleReader, DIHtmlLabel, DIHtmlParser, DIMime, DIRegEx, DISQLite3, DITidy, DIUcl, DIUnicode, DIXml, YuBrotli, YuImage, YuNetSurf, YuOpenSSL, YuPcre2, YuPdf, YuStemmer, YuXmlSec, YuZip.

Differences

This shows you the differences between two versions of the page.

@@ Line 1: / Line 1: @@
+====== DIConverters ======
+{{page>header}}
+===== Overview =====
+DIConverters supplies [[encodings|144 character set encodings]] with two complementary functions each, adding up to a total of more than 288 character conversion functions:
+  - to decode from encoding to Unicode.
+  - to encode from Unicode to encoding.
+All conversion are fully native and require no DLL or system dependencies. Applications build with DIConverters therefore run on all Win32 platforms starting from (and including!) Windows 95.
+The converter functions allow for smart-linking: Only those functions used by the application are actually included into the executable. This keeps applications small when only one or a few character conversions are needed.
+Click [[encodings|here]] for a listing character sets and encodings supported by DIConverters.
+===== Using DIConverters =====
+All conversions take place on a Unicode character base. In multi-byte character encodings, a single Unicode character is represented by one or more bytes.
+<WRAP tip>
+DIConverters can be used by [[:products:unicode:|DIUnicode]], which contains comfortable classes with automatic character conversion for both reading and writing Unicode text. With [[:products:unicode:|DIUnicode]], all text operations take place on a WideChar / WideString basis regardless of the actual text encoding. This allows applications to use the very same import / export routine on all 144 character sets and encodings.
+</WRAP>
+===== Conversion Preparations ====
+Functions for both direct decoding and encoding require a conversion state variable of type conv_t, which is a record structure defined in DIConverters.pas. Before actually starting a direct character coding, this variable must be initialized with zeros. Applications can easily accomplished this with the following standard Pascal call:
+<code delphi>
+var
+  conv: conv_struct;
+begin
+  FillChar(conv, SizeOf(conv), 0);
+</code>
+You can then proceed using the decoding and encoding functions described below.
+===== Reading with Unicode Decoding =====
+The function prototype to decode multi-byte encodings to Unicode is:
+<code delphi>
+xxx_mbtowc = function(
+  const conv: conv_t;
+  var pwc: ucs4_t;
+  const s: Pointer;
+  const n: Integer): Integer;
+</code>
+The xxx stands for the actual character encoding, like utf8_mbtowc.
+It converts the byte sequence starting at s to a Unicode code point. Up to n bytes must be available at s, and n >= 1. The Unicode representation is stored in pwc.
+The function's return value indicates if the conversion was successful:
+  * **number of bytes consumed:** Success, a wide character was read.
+  * **-1:** The byte sequence at s is invalid.
+  * **-2:** The number of bytes n is too small.
+  * **-2-(number of bytes consumed):** Only a shift sequence was read.
+A few encodings may require xxx_mbtowc to be combined with xxx_flushwc:
+<code delphi>
+xxx_flushwc = function(
+  const conv: conv_t;
+  var pwc: ucs4_t): Integer;
+</code>
+xxx_flushwc returns to the initial state and stores the pending wide character, if any. The result is 1 (if a wide character was read) or 0 if none was pending.
+Calling xxx_flushwc is not required for most encodings.
+===== Writing with Unicode Encoding =====
+The function prototype to encode a Unicode code point to multi-byte is:
+<code delphi>
+xxx_wctomb = function(
+  const conv: conv_t;
+  const r: Pointer;
+  const wc: ucs4_t;
+  const n: Integer): Integer;
+</code>
+The xxx stands for the actual character encoding, like utf8_mbtowc.
+The function converts the wide character wc to the character set xxx, and stores the result beginning at r. Up to n bytes may be written at r. n is >= 1.
+The function's result is the number of bytes written, or -1 if invalid, or -2 if n is too small.
+To write any pending characters and return to the original state, a call to xxx_reset may be required for some encodings:
+<code delphi>
+xxx_reset = function(
+  const conv: conv_t;
+  const r: Pointer;
+  const n: Integer): Integer;
+</code>
+It stores a shift sequences returning to the initial state beginning at r. Up to n bytes may be written at r. n is >= 0. It returns the number of bytes written, or -2 if n is too small.
+{{tag>Character "Character Sets" "Character Encodings" Converter Freeware Unicode UTF}}