摘要 |
PROBLEM TO BE SOLVED: To collect a body of a document including a retrieval keyword without constructing large-scale equipment adopted by a retrieval engine when collecting the whole body of the document including the retrieval keyword from a large quantity of structured document data on a network. SOLUTION: The retrieval engine on the network performs retrieval based on the retrieval keyword, retrieved document list information is acquired, the acquired document list information is analyzed, link information for accessing each the document and an excerpted sentence of each the document are acquired, the document data of a link destination page are acquired from the network based on the link information, a structure of the acquired document data is analyzed, a part of or all of character strings of character information included in the document data are acquired as one or more blocks, and it is decided that the excerpted sentence or the character string of the block including the more character strings or more characters included in the excerpted sentence among the acquired blocks is the body. COPYRIGHT: (C)2008,JPO&INPIT
|