摘要 |
PURPOSE: A method and a system for generating basic data for judging a similar electronic document are provided to detect the similar electronic document through a computer even if contents are different with each other little by little. CONSTITUTION: A receiver(110) receives the electronic document. A token extractor(120) extracts a token by dividing the contents of the received electronic document into a predetermined unit. A token frequency calculator(130) calculates a frequency of each token extracted from the electronic document. A basic data generator(140) generates the basic data by reducing the electronic document to a predetermined size after removing the token of a low frequency.
|