发明名称 NOISE REMOVING SYSTEM FOR DOCUMENT DATA
摘要 PROBLEM TO BE SOLVED: To provide a technique for automatically deleting an unnecessary character string from various kinds of document data as the preprocessing of automatic keyword extraction. SOLUTION: A noise removing system 40 for document data is provided, which includes: an alphameric characters noise removing part 48 for reading the document data from a first noise-removed document DB 46, calculating the concentration of the alphameric characters, which is the percentage of the alphameric characters concerning each row of each kind of document data, comparing the concentration of the alphameric characters with a threshold D, determining the row as a noise row when the concentration of the alphameric characters is the threshold D or more, and deleting the noise from each kind of document data. COPYRIGHT: (C)2010,JPO&INPIT
申请公布号 JP2009271797(A) 申请公布日期 2009.11.19
申请号 JP20080122782 申请日期 2008.05.08
申请人 NOMURA RESEARCH INSTITUTE LTD 发明人 TAKEHARA GASUAKI
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址