摘要 |
PROBLEM TO BE SOLVED: To detect that a plurality of documents are continuously excerpted.SOLUTION: The present invention comprises: cutting a character string segment out of one sentence of an input document; determining a start point of the character string segment; making a digest, in which a character string corresponding to a character string segment for each predetermined number of characters from the start point has been converted into a hash function, slide by a predetermined number of characters, and storing a document ID and a digest group of the digest in a digest DB; reading out the digest from the digest DB; and determining that a plurality of documents are continuously excerpted, when excerpting, in a character string segment separated by a predetermined window size w of the digest, a document having a different digest. |