发明名称 DOCUMENT DATA COMPARISON METHOD, DOCUMENT DATA COMPARISON APPARATUS, AND DOCUMENT DATA COMPARISON PROGRAM
摘要 PROBLEM TO BE SOLVED: To provide a document data comparison method, a document data comparison apparatus, and a document data comparison program in which conversion into character codes and calculation of metric values are contrived to allow utilization of attribute information other than character codes.SOLUTION: A document data comparison apparatus 1 comprises: a character string/attribute extraction unit 2 which divides document data composed of sets of character string fragments into character string fragments and extracts attribute information representing the status of the character string fragments in the document; a metric calculation unit 3 which converts the characters contained in the character string fragments into the corresponding character codes to extract character code sequences, and calculates the similarity metric of the character string fragments; and a comparison unit 3 which searches the document data for sets of character string fragments having a ratio of similarity metric being within a numerical range A and matched attribute information, and which when the ratio of similarity metric is within a numerical range B, determines that the contents of the sets of character string fragments match and outputs the determination as a comparison result. This allows the contents of the document data to be compared in the form character codes.
申请公布号 JP2015069393(A) 申请公布日期 2015.04.13
申请号 JP20130202798 申请日期 2013.09.27
申请人 TOSHIBA CORP 发明人 KURATANI MIO;TODA ATSUKO;ANDY ANTONIUS;WATANABE NORIO;MOCHIJI SHIGERU
分类号 G06F17/30;G06F17/21;G06F17/24 主分类号 G06F17/30
代理机构 代理人
主权项
地址