摘要 |
An improved document management system with high-speed retrieval by example retrieves a document attaching a target document, in whole or part, by comparing descriptors of documents. A descriptor is derived from a pattern of labels, where each label is associated with a character, or more precisely, a character bounding box. A bounding box is found by examining contiguous pixels in an image. The particular label associated with a bounding box depends on the value of a metric measured from that bounding box. In one system, the metric is the spacing between the bounding box and an adjacent bounding box, in which the labels approximately reflect a pattern of word lengths. In other systems, where words lengths are not present, the metric might be pixel density and the pattern of labels approximately reflect a pattern of denser characters and sparser characters. The document management system, or just the query portion of the document management system could be part of a copier, where a sample page is input to the copier and the copier retrieves the matching document and prints it.
|