发明名称 TEXT EXTRACTION METHOD, TEXT EXTRACTION DEVICE AND TEXT EXTRACTION PROGRAM
摘要 <P>PROBLEM TO BE SOLVED: To highly accurately extract a text part from a document expressed in a tree structure. <P>SOLUTION: A text part determination function section 8 classifies feature information extracted for each node of an input document according to storage data of a database 4, determines whether each node is a text part or not, and stores a determination result in a storage section 9. A boundary acquisition function section 10 acquires the determination result by referring a storage section 9, then successively retrieves boundaries of the text part by tracing from lower nodes to upper nodes in the tree structure of the input document according to the determination result for the lower nodes, and store a retrieval result in a storage section 11. A text extraction function section 12 extracts character strings under the boundaries as the text part by referring the storage section 11. <P>COPYRIGHT: (C)2012,JPO&INPIT
申请公布号 JP2012027852(A) 申请公布日期 2012.02.09
申请号 JP20100168546 申请日期 2010.07.27
申请人 NIPPON TELEGR & TELEPH CORP <NTT> 发明人 FUJITA NAOKI;YASUDA YOSHIHITO;HIROSHIMA NOBUAKI;KATAOKA RYOJI
分类号 G06F17/21;G06F17/28 主分类号 G06F17/21
代理机构 代理人
主权项
地址