products:htmlparser:plugins
Delphi 12 Athens Updates Available!
To download, click your product: DIContainers, DIConverters, DICreole, DIFileFinder, DIGoogleReader, DIHtmlLabel, DIHtmlParser, DIMime, DIRegEx, DISQLite3, DITidy, DIUcl, DIUnicode, DIXml, YuBrotli, YuImage, YuNetSurf, YuOpenSSL, YuPcre2, YuPdf, YuStemmer, YuXmlSec, YuZip.
To download, click your product: DIContainers, DIConverters, DICreole, DIFileFinder, DIGoogleReader, DIHtmlLabel, DIHtmlParser, DIMime, DIRegEx, DISQLite3, DITidy, DIUcl, DIUnicode, DIXml, YuBrotli, YuImage, YuNetSurf, YuOpenSSL, YuPcre2, YuPdf, YuStemmer, YuXmlSec, YuZip.
no way to compare when less than two revisions
Differences
This shows you the differences between two versions of the page.
— | products:htmlparser:plugins [2016/01/22 15:08] (current) – created - external edit 127.0.0.1 | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ====== DIHtmlParser: | ||
+ | {{page> | ||
+ | |||
+ | ===== Overview ===== | ||
+ | |||
+ | [[.: | ||
+ | |||
+ | Plugins are a flexible approach to extend the functionality of TDIHtmlParser. They can act on HTML data in completely new ways unknown to TDIHtmlParser at the time of its writing. This allows the parser to stay small and concentrate on what it does best: Fast and reliable HTML parsing. Each of the plugins, on the other hand, can add its own specialized functionality to the core HTML parser as required. | ||
+ | |||
+ | ===== A Plugin Scenario ===== | ||
+ | |||
+ | Think of how to extract the title text of an HTML document. You would probably first want to locate the ''< | ||
+ | |||
+ | The TDIHtmlTablesPlugin plugin locates and extracts a HTML document' | ||
+ | |||
+ | ===== Ready-Made Plugins ===== | ||
+ | |||
+ | DIHtmlParser ships with a number of plugins all ready to use. | ||
+ | |||
+ | ==== Case Plugin ==== | ||
+ | |||
+ | {{tdihtmlcaseplugin.gif |TDIHtmlCasePlugin}} The TDIHtmlCasePlugin changes tag and attribute names to upper case or lower case. It has been requested by a user to create uniformly formatted HTML and has since been proven useful to many others. | ||
+ | |||
+ | ==== Character Set Plugin ==== | ||
+ | |||
+ | {{tdihtmlcharsetplugin.gif |TDIHtmlCharSetPlugin}}The TDIHtmlCharSetPlugin watches out for character set information in HTML documents and automatically updates the character decoding of the HTML parser. This is usefull if the character set is unknown prior to the parsing or changes in the middle of a document. | ||
+ | |||
+ | ==== E-Mails Plugin ==== | ||
+ | |||
+ | {{tdihtmlemailsplugin.gif |TDIHtmlEmailsPlugin}} The TDIHtmlEmailsPlugin scans an HTML document for links to e-mail addresses. For each hit it can trigger an application event and / or add the address to an internal list for later retrieval. This plugin should not be abused for an e-mail harvester. | ||
+ | |||
+ | ==== Events Plugin ==== | ||
+ | |||
+ | {{tdihtmleventsplugin.gif |TDIHtmlEventsPlugin}}The TDIHtmlEventsPlugin triggers events for HTML piece. This turns DIHtmlParser into something like an HTML SAX parsers. TDIHtmlEventsPlugin supports tag filtering (as all plugins do), which SAX parsers do not! | ||
+ | |||
+ | ==== Links Plugin ==== | ||
+ | |||
+ | {{tdihtmllinksplugin.gif |TDIHtmlLinksPlugin}}The Links plugin collects all links contained in an HTML document. It is fully customizable and can also trigger an event for each new link. | ||
+ | |||
+ | ==== Table Plugin ==== | ||
+ | |||
+ | {{tdihtmltablesplugin.gif |TDIHtmlTablesPlugin}} The Table plugin keeps tracks of HTML tables encountered during the parsing. Other parsing processes can query the Table plugin about the table cell and column and the table nesting. | ||
+ | |||
+ | ==== Writer Plugin ==== | ||
+ | |||
+ | {{tdihtmlwriterplugin.gif |TDIHtmlWriterPlugin}}The Writer plugin automates the writing of HTML data to another HTML document. It writes over 70 different character sets and encodings (144 with [[products: | ||
+ | |||
+ | {{tag> |
products/htmlparser/plugins.txt · Last modified: 2016/01/22 15:08 by 127.0.0.1