发明名称 DOCUMENT COLLECTION DEVICE, DOCUMENT COLLECTION METHOD, PROGRAM AND RECORDING MEDIUM
摘要 PROBLEM TO BE SOLVED: To collect a body of a document including a retrieval keyword without constructing large-scale equipment adopted by a retrieval engine when collecting the whole body of the document including the retrieval keyword from a large quantity of structured document data on a network. SOLUTION: The retrieval engine on the network performs retrieval based on the retrieval keyword, retrieved document list information is acquired, the acquired document list information is analyzed, link information for accessing each the document and an excerpted sentence of each the document are acquired, the document data of a link destination page are acquired from the network based on the link information, a structure of the acquired document data is analyzed, a part of or all of character strings of character information included in the document data are acquired as one or more blocks, and it is decided that the excerpted sentence or the character string of the block including the more character strings or more characters included in the excerpted sentence among the acquired blocks is the body. COPYRIGHT: (C)2008,JPO&INPIT
申请公布号 JP2008176685(A) 申请公布日期 2008.07.31
申请号 JP20070011181 申请日期 2007.01.22
申请人 NIPPON TELEGR & TELEPH CORP <NTT> 发明人 SATO YOSHIHIDE;KAWASHIMA HARUMI;SEKIGUCHI YUICHIRO
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址