摘要 |
PROBLEM TO BE SOLVED: To provide a document classification program capable of enhancing determination accuracy based on a specific category (e.g., illegality and harmfulness) for Web document information. SOLUTION: Document information is described with sentence information and a markup language. The document classification program causes a computer to function as: document information separation means that separates object document information to be an analysis object into sentence information and markup language information; feature amount generation means that counts the number of times a character strings registered in advance appears for each of the sentence information and the markup language information, and generates a feature amount of a multidimensional vector indicating the number of appearances for every character string element; feature amount determination means that determines whether or not the object feature amount of the object document information falls in a specific range of learning feature amount obtained from a large amount of learning document information included in a specific category; and category classification means that classifies object document information determined to be true by the feature amount determination means as information included in the specific category. COPYRIGHT: (C)2012,JPO&INPIT |