发明名称 UNNECESSARY CHARACTER STRING EXTRACTION DEVICE, ITS METHOD AND PROGRAM AND DEVICE USING THE SAME
摘要 <P>PROBLEM TO BE SOLVED: To provide a technology for automatically extracting any unnecessary character string from a Web document. <P>SOLUTION: An anchor text extraction means 11 extracts a character string pertinent to an anchor text from each Web document stored in a Web document storage part 21, and totals anchor texts by the number of reference origin documents or the number of reference origin sites for each reference destination URL, and stores them in an anchor storage part 22. An unnecessary character string extraction means 12 extracts the anchor text stored in the anchor storage part 22 for each identical reference destination URL, and compares an anchor text a1 having the maximum number of reference origin documents or the number of reference origin sites among n pieces of anchor texts having the identical reference destination URL with the other anchor texts (a2) to (an), and extracts any character string other than the character string of the anchor text (a1) among the anchor texts (a2) to (an) having the same character string as that of the anchor text (a1). <P>COPYRIGHT: (C)2007,JPO&INPIT
申请公布号 JP2007219580(A) 申请公布日期 2007.08.30
申请号 JP20060036059 申请日期 2006.02.14
申请人 NEC CORP 发明人 TATEISHI KENJI;KUSUI MASARU
分类号 G06F17/21 主分类号 G06F17/21
代理机构 代理人
主权项
地址