摘要 |
A copy document determining system using a frequent phrase and a method thereof, and a frequent phrase extracting system and a method thereof are provided to search phrases by extracting identification data which shows the frequency of a document based on index data including the identification data of the document. A document frequency determining unit(102) extracts ID data for showing a preset document frequency by using index data. An ID data set generator(103) produces ID data of each document from a newly collected document set, and then a ID data set by excluding the extracted ID data. An index data generator(104) produces index data corresponding to the ID data by using the ID data set. A copy document determining unit(105) inquires a having an overlapped ID data through the generated index data to determine whether the duplication exists.
|