摘要 |
<P>PROBLEM TO BE SOLVED: To provide an improved similar document detection method. <P>SOLUTION: A method which is executed by a similar document detection device includes the steps of: extracting entities of which an importance contributing element to be calculated, from plural web documents; calculating weight values for the respective entities based on the calculated importance contributing elements; and determining whether the plural web documents are similar to each other on the basis of the calculated weight values. In plural documents which are possibly similar documents, a core portion and non-core portion are discriminated from each other in each document, and a different weight value is given to each portion to determine similar documents by an improved manner; and thus, the accuracy of a search engine is improved. <P>COPYRIGHT: (C)2013,JPO&INPIT |