摘要 |
A digital-content extractor comprises a data-acquisition device configured to generate a digital representation of a source, a data-extraction engine communicatively coupled to the data-acquisition device, the data-extraction engine configured to apply a combination of a plurality of digital-content extraction algorithms over the source, wherein the data-extraction engine is configured to automatically accommodate new data-extraction algorithms. A method for improving the accuracy of extracted digital content comprises reading a digital source, identifying the digital source by type, generating an acceptance level for each of a plurality of digital-content extraction algorithms based on a confidence value and a credibility rating associated with the accuracy of each of the plurality of digital-content extraction algorithms, and applying a combination of at least two of the plurality of digital-content extraction algorithms based on the acceptance level to thereby generate extracted digital content of the digital source.
|