发明名称
摘要 PROBLEM TO BE SOLVED: To provide a document classification program capable of enhancing determination accuracy based on a specific category (e.g., illegality and harmfulness) for Web document information. SOLUTION: Document information is described with sentence information and a markup language. The document classification program causes a computer to function as: document information separation means that separates object document information to be an analysis object into sentence information and markup language information; feature amount generation means that counts the number of times a character strings registered in advance appears for each of the sentence information and the markup language information, and generates a feature amount of a multidimensional vector indicating the number of appearances for every character string element; feature amount determination means that determines whether or not the object feature amount of the object document information falls in a specific range of learning feature amount obtained from a large amount of learning document information included in a specific category; and category classification means that classifies object document information determined to be true by the feature amount determination means as information included in the specific category. COPYRIGHT: (C)2012,JPO&INPIT
申请公布号 JP5527845(B2) 申请公布日期 2014.06.25
申请号 JP20100185321 申请日期 2010.08.20
申请人 发明人
分类号 G06F17/30;G06N3/00 主分类号 G06F17/30
代理机构 代理人
主权项
地址
您可能感兴趣的专利