SYSTEM AND METHOD FOR TEXT CATEGORIZATION BASED ON ONTOLOGIES
摘要
<p>A system for text categorization based on ontologies comprising data collector software modules; a categorizer software module; and a database comprising an indexed database of documents and their categorizations, and further comprising a plurality of ontologies, each ontology comprising a plurality of hierarchical taxonomies and each hierarchical taxonomy comprising a plurality of taxons. The data collector software modules receive a document to be classified and submit them to the categorizer software module; and the categorizer performs the following steps to categorize each document: splitting the document into sentences; selecting words or phrases that are present in ontologies stored in the database server; selecting a plurality of subtrees from the ontologies based on the presence of specific subcategories in the document; determining a weight for each subcategory; pruning subcategories having a weight below a threshold; and for each of the plurality of modified subtrees, computing a conditionality coefficient.</p>
申请公布号
WO2014176600(A1)
申请公布日期
2014.10.30
申请号
WO2014US35735
申请日期
2014.04.28
申请人
SOUTH EASTERN PUBLISHERS INC.;CHASHCHIN, KIRILL;ANSHUKOV, SERGEY;BARDIN, VALERY;KORDONSKY, SIMON
发明人
CHASHCHIN, KIRILL;ANSHUKOV, SERGEY;BARDIN, VALERY;KORDONSKY, SIMON