products:converters:index
Delphi 12 Athens Updates Available!
To download, click your product: DIContainers, DIConverters, DICreole, DIFileFinder, DIGoogleReader, DIHtmlLabel, DIHtmlParser, DIMime, DIRegEx, DISQLite3, DITidy, DIUcl, DIUnicode, DIXml, YuBrotli, YuImage, YuNetSurf, YuOpenSSL, YuPcre2, YuPdf, YuStemmer, YuXmlSec, YuZip.
To download, click your product: DIContainers, DIConverters, DICreole, DIFileFinder, DIGoogleReader, DIHtmlLabel, DIHtmlParser, DIMime, DIRegEx, DISQLite3, DITidy, DIUcl, DIUnicode, DIXml, YuBrotli, YuImage, YuNetSurf, YuOpenSSL, YuPcre2, YuPdf, YuStemmer, YuXmlSec, YuZip.
no way to compare when less than two revisions
Differences
This shows you the differences between two versions of the page.
— | products:converters:index [2022/02/04 16:57] (current) – created - external edit 127.0.0.1 | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ====== DIConverters ====== | ||
+ | {{page> | ||
+ | |||
+ | ===== Overview ===== | ||
+ | |||
+ | DIConverters supplies [[encodings|144 character set encodings]] with two complementary functions each, adding up to a total of more than 288 character conversion functions: | ||
+ | |||
+ | - to decode from encoding to Unicode. | ||
+ | - to encode from Unicode to encoding. | ||
+ | |||
+ | All conversion are fully native and require no DLL or system dependencies. Applications build with DIConverters therefore run on all Win32 platforms starting from (and including!) Windows 95. | ||
+ | |||
+ | The converter functions allow for smart-linking: | ||
+ | |||
+ | Click [[encodings|here]] for a listing character sets and encodings supported by DIConverters. | ||
+ | |||
+ | ===== Using DIConverters ===== | ||
+ | |||
+ | All conversions take place on a Unicode character base. In multi-byte character encodings, a single Unicode character is represented by one or more bytes. | ||
+ | |||
+ | <WRAP tip> | ||
+ | DIConverters can be used by [[: | ||
+ | </ | ||
+ | |||
+ | ===== Conversion Preparations ==== | ||
+ | |||
+ | Functions for both direct decoding and encoding require a conversion state variable of type conv_t, which is a record structure defined in DIConverters.pas. Before actually starting a direct character coding, this variable must be initialized with zeros. Applications can easily accomplished this with the following standard Pascal call: | ||
+ | |||
+ | <code delphi> | ||
+ | var | ||
+ | conv: conv_struct; | ||
+ | begin | ||
+ | FillChar(conv, | ||
+ | </ | ||
+ | |||
+ | You can then proceed using the decoding and encoding functions described below. | ||
+ | |||
+ | ===== Reading with Unicode Decoding ===== | ||
+ | |||
+ | The function prototype to decode multi-byte encodings to Unicode is: | ||
+ | |||
+ | <code delphi> | ||
+ | xxx_mbtowc = function( | ||
+ | const conv: conv_t; | ||
+ | var pwc: ucs4_t; | ||
+ | const s: Pointer; | ||
+ | const n: Integer): Integer; | ||
+ | </ | ||
+ | |||
+ | The xxx stands for the actual character encoding, like utf8_mbtowc. | ||
+ | |||
+ | It converts the byte sequence starting at s to a Unicode code point. Up to n bytes must be available at s, and n >= 1. The Unicode representation is stored in pwc. | ||
+ | |||
+ | The function' | ||
+ | |||
+ | * **number of bytes consumed:** Success, a wide character was read. | ||
+ | * **-1:** The byte sequence at s is invalid. | ||
+ | * **-2:** The number of bytes n is too small. | ||
+ | * **-2-(number of bytes consumed): | ||
+ | |||
+ | A few encodings may require xxx_mbtowc to be combined with xxx_flushwc: | ||
+ | |||
+ | <code delphi> | ||
+ | xxx_flushwc = function( | ||
+ | const conv: conv_t; | ||
+ | var pwc: ucs4_t): Integer; | ||
+ | </ | ||
+ | |||
+ | xxx_flushwc returns to the initial state and stores the pending wide character, if any. The result is 1 (if a wide character was read) or 0 if none was pending. | ||
+ | |||
+ | Calling xxx_flushwc is not required for most encodings. | ||
+ | |||
+ | |||
+ | ===== Writing with Unicode Encoding ===== | ||
+ | |||
+ | The function prototype to encode a Unicode code point to multi-byte is: | ||
+ | |||
+ | <code delphi> | ||
+ | xxx_wctomb = function( | ||
+ | const conv: conv_t; | ||
+ | const r: Pointer; | ||
+ | const wc: ucs4_t; | ||
+ | const n: Integer): Integer; | ||
+ | </ | ||
+ | |||
+ | The xxx stands for the actual character encoding, like utf8_mbtowc. | ||
+ | |||
+ | The function converts the wide character wc to the character set xxx, and stores the result beginning at r. Up to n bytes may be written at r. n is >= 1. | ||
+ | |||
+ | The function' | ||
+ | |||
+ | To write any pending characters and return to the original state, a call to xxx_reset may be required for some encodings: | ||
+ | |||
+ | <code delphi> | ||
+ | xxx_reset = function( | ||
+ | const conv: conv_t; | ||
+ | const r: Pointer; | ||
+ | const n: Integer): Integer; | ||
+ | </ | ||
+ | |||
+ | It stores a shift sequences returning to the initial state beginning at r. Up to n bytes may be written at r. n is >= 0. It returns the number of bytes written, or -2 if n is too small. | ||
+ | |||
+ | {{tag> |
products/converters/index.txt · Last modified: 2022/02/04 16:57 by 127.0.0.1