发明名称 |
DOCUMENT DATA COMPARISON METHOD, DOCUMENT DATA COMPARISON APPARATUS, AND DOCUMENT DATA COMPARISON PROGRAM |
摘要 |
PROBLEM TO BE SOLVED: To provide a document data comparison method, a document data comparison apparatus, and a document data comparison program in which conversion into character codes and calculation of metric values are contrived to allow utilization of attribute information other than character codes.SOLUTION: A document data comparison apparatus 1 comprises: a character string/attribute extraction unit 2 which divides document data composed of sets of character string fragments into character string fragments and extracts attribute information representing the status of the character string fragments in the document; a metric calculation unit 3 which converts the characters contained in the character string fragments into the corresponding character codes to extract character code sequences, and calculates the similarity metric of the character string fragments; and a comparison unit 3 which searches the document data for sets of character string fragments having a ratio of similarity metric being within a numerical range A and matched attribute information, and which when the ratio of similarity metric is within a numerical range B, determines that the contents of the sets of character string fragments match and outputs the determination as a comparison result. This allows the contents of the document data to be compared in the form character codes. |
申请公布号 |
JP2015069393(A) |
申请公布日期 |
2015.04.13 |
申请号 |
JP20130202798 |
申请日期 |
2013.09.27 |
申请人 |
TOSHIBA CORP |
发明人 |
KURATANI MIO;TODA ATSUKO;ANDY ANTONIUS;WATANABE NORIO;MOCHIJI SHIGERU |
分类号 |
G06F17/30;G06F17/21;G06F17/24 |
主分类号 |
G06F17/30 |
代理机构 |
|
代理人 |
|
主权项 |
|
地址 |
|