摘要 |
<P>PROBLEM TO BE SOLVED: To highly accurately extract a text part from a document expressed in a tree structure. <P>SOLUTION: A text part determination function section 8 classifies feature information extracted for each node of an input document according to storage data of a database 4, determines whether each node is a text part or not, and stores a determination result in a storage section 9. A boundary acquisition function section 10 acquires the determination result by referring a storage section 9, then successively retrieves boundaries of the text part by tracing from lower nodes to upper nodes in the tree structure of the input document according to the determination result for the lower nodes, and store a retrieval result in a storage section 11. A text extraction function section 12 extracts character strings under the boundaries as the text part by referring the storage section 11. <P>COPYRIGHT: (C)2012,JPO&INPIT |