发明名称 METHOD FOR AUTOMATICALLY GENERATING DOCUMENT IDENTIFICATION DICTIONARY AND DOCUMENT PROCESSING SYSTEM
摘要 PROBLEM TO BE SOLVED: To automatically generate a knowledge dictionary recording a character string arrangement pattern to be used for document identification processing by using document image samples. SOLUTION: A character string extraction part 102 automatically extracts a character string from a document image by using a character string recognizing method in a document identification system. A stability calculation part 103 calculates the stability of each character string by checking the appearance frequency of the extracted character string. An inherency calculation part 104 calculates inherency by checking the number of document sorts in which each character string appears in character string data B to be an output from the calculation part 103. A character string priority calculation part 105 calculates registration priority on the basis of the value of inherency calculated by the calculation part 104 and other features of the character string. A document identification dictionary output part 106 generates a document identification dictionary in accordance with the registration priority of respective character strings.
申请公布号 JP2003115028(A) 申请公布日期 2003.04.18
申请号 JP20010307050 申请日期 2001.10.03
申请人 HITACHI LTD 发明人 FUJIO MASAKAZU;FURUKAWA NAOHIRO;SAKO YUTAKA
分类号 G06K9/20;G06F19/00;G06Q10/10;G06Q40/00;G06Q40/02 主分类号 G06K9/20
代理机构 代理人
主权项
地址