摘要 |
<P>PROBLEM TO BE SOLVED: To provide a technology for automatically extracting any unnecessary character string from a Web document. <P>SOLUTION: An anchor text extraction means 11 extracts a character string pertinent to an anchor text from each Web document stored in a Web document storage part 21, and totals anchor texts by the number of reference origin documents or the number of reference origin sites for each reference destination URL, and stores them in an anchor storage part 22. An unnecessary character string extraction means 12 extracts the anchor text stored in the anchor storage part 22 for each identical reference destination URL, and compares an anchor text a1 having the maximum number of reference origin documents or the number of reference origin sites among n pieces of anchor texts having the identical reference destination URL with the other anchor texts (a2) to (an), and extracts any character string other than the character string of the anchor text (a1) among the anchor texts (a2) to (an) having the same character string as that of the anchor text (a1). <P>COPYRIGHT: (C)2007,JPO&INPIT |