摘要 |
PROBLEM TO BE SOLVED: To extract digests in which excerpts are concatenated sequentially or with a few characters sandwiched between them and the excerpts account for predetermined or larger portions of the wholes, and to minimize detection errors while suppressing the size of a database (DB) storing the digests.SOLUTION: The present invention comprises: cutting a character string segment out of one sentence of an input document; determining a start point of the character string segment; making a digest, corresponding to a character string segment for each predetermined number of characters from the start point, slide by a predetermined number of characters, and storing a document ID and a digest group of the digest in a digest DB; reading out the digest from the digest DB; and determining that a certain digest is a document composed of excerpts, when the number of the same digests as the certain digest included in other documents or the ratio of digests matching the other documents is larger than a predetermined value. |