发明名称 類似ページ検出装置、類似ページ検出方法、類似ページ検出プログラム
摘要 PROBLEM TO BE SOLVED: To provide a device capable of efficiently arranging and integrating a large amount of information. SOLUTION: A similar page detection device includes: a web page information database 110 for storing web page information; a hash calculation device 120 which takes out the web page information in the database 110, performs a morphological analysis for each sentence in each page, extracts a selecting word representing a word to be a characteristic to constitute the sentence, and performs hash calculation processing according to the number of the extracted selecting words; a page unit hash temporary recording device 150 and a hash recording device 160 which record the calculated hash value, an URL of the page, and a sentence number indicating how many sentences appear before the sentence in the page, and information as to whether the sentence has an important word or not as one set; and a hash aggregation device 170 which integrates pages having the same hash value into a group, and outputs and records the group to/in a similar page group recording device 180 on the basis of the information recorded in the hash recording device 160. COPYRIGHT: (C)2013,JPO&INPIT
申请公布号 JP5618968(B2) 申请公布日期 2014.11.05
申请号 JP20110247978 申请日期 2011.11.11
申请人 发明人
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址