发明名称 A text classification system and method for the analysis and management of text
摘要 Documents are classified into one or more clusters corresponding to predefined classification categories by building a knowledge base comprising matrices of vectors which indicate the significance of terms within a corpus of text formed by the documents and classified in the knowledge base to each cluster. The significance of terms is determined assuming a standard normal probability distribution, and terms are determined to be significant to a cluster if their probability of occurrence being due to chance is low. For each cluster, statistical signatures comprising sums of weighted products and intersections of cluster terms to corpus terms are generated and used as discriminators for classifying documents. The knowledge base is built using prefix and suffix lexical rules which are context-sensitive and applied selectively to improve the accuracy and precision of classification.
申请公布号 NZ502332(A) 申请公布日期 2002.10.25
申请号 NZ19980502332 申请日期 1998.06.16
申请人 THE DIALOG CORPORATION 发明人 ZHILYAEV, MAXIM
分类号 G06F17/30;G06K9/62;(IPC1-7):G06K9/62;G06K9/68;G06K9/70;G06K9/74 主分类号 G06F17/30
代理机构 代理人
主权项
地址