发明名称 自動構成文書判定装置及び方法
摘要 PROBLEM TO BE SOLVED: To extract digests in which excerpts are concatenated sequentially or with a few characters sandwiched between them and the excerpts account for predetermined or larger portions of the wholes, and to minimize detection errors while suppressing the size of a database (DB) storing the digests.SOLUTION: The present invention comprises: cutting a character string segment out of one sentence of an input document; determining a start point of the character string segment; making a digest, corresponding to a character string segment for each predetermined number of characters from the start point, slide by a predetermined number of characters, and storing a document ID and a digest group of the digest in a digest DB; reading out the digest from the digest DB; and determining that a certain digest is a document composed of excerpts, when the number of the same digests as the certain digest included in other documents or the ratio of digests matching the other documents is larger than a predetermined value.
申请公布号 JP5906228(B2) 申请公布日期 2016.04.20
申请号 JP20130229438 申请日期 2013.11.05
申请人 日本電信電話株式会社 发明人 船越 要;鷲崎 誠司
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址