摘要 |
PURPOSE: A similar document classifying apparatus which uses exposure analysis is provided to simplify a process by classifying a similar document based on the prevalence of the noun within an electronic document. CONSTITUTION: A noun prevalence calculating unit(110) extracts the noun of an electronic document through the morphological analysis and calculates noun prevalence. A noun prevalence calculating unit(120) applies weight value according to whether there is the noun within a subject and calculates the prevalence index. An index file information processor(130) stores document number information.
|