摘要 |
<p>A device for rule-based classification of data items. The device is operatively connected to a database. It extracts a received data item into data blocks of one or more information elements, determines a functional purpose of a data block, records information on the determined functional purpose of the data block and sends the record to the database. The device finds for the data block a selector path that comprises a sequence of markup tags to which belong begin tags that are detected but for which the corresponding end tag is not detected before the data block. The rule repository comprises a rule that maps the selector path to a functional purpose of the data block. The method makes the extraction more robust and thus reduces the number of required computer operations, because several re-runs after manual correction operations are avoided.</p> |