摘要 |
PROBLEM TO BE SOLVED: To improve the precision of detection to enable a detection ignoring a reading order of document blocks by performing a word-based concept retrieval to retrieval of an original computerized document from a scanned paper document, and detecting a similar document. SOLUTION: This system comprises a document storage means for storing the original computerized document; a document reading means for reading a document printed from the original computerized document as image data; a document recognition means for recognizing the read image data as a character code; a document analysis means for analyzing the recognized document data to extract layout information of document; a document detection means for performing a concept retrieval by vectorization of a word to the detection of the stored original computerized document from the analyzed document data to detect the similar document; and a detection result output means for outputting the detected result. COPYRIGHT: (C)2005,JPO&NCIPI
|