发明名称 METHOD AND APPARATUS TO BUILD A COMMON CLASSIFICATION SYSTEM ACROSS MULTIPLE CONTENT ENTITIES
摘要 A content classification system classifies documents of a plurality of content entities into a hierarchical discipline structure. The content classification system receives a set of taxonomic labels collectively defining a hierarchical taxonomy and a plurality of documents. Each document is associated with one of the content entities. The content classification system extracts features from the received documents. A learned model is generated for assigning taxonomic labels to documents associated with a representative content entity using the features extracted from documents associated with the representative content entity. The content classification system assigns one or more taxonomic labels to each document of the other content entities using the learned model applied to the features extracted from the respective document. The documents of the plurality of content entities are classified based on the assigned taxonomic labels.
申请公布号 US2015324459(A1) 申请公布日期 2015.11.12
申请号 US201414274189 申请日期 2014.05.09
申请人 Chegg, Inc. 发明人 Chhichhia Charmy;Sri Paul Chris;Le Chevalier Vincent
分类号 G06F17/30;G06N7/02;G06N99/00 主分类号 G06F17/30
代理机构 代理人
主权项 1. A method for classifying documents of a plurality of content entities into a hierarchical discipline structure in a content management system, the method comprising: accessing a set of taxonomic labels, the taxonomic labels collectively defining a hierarchical taxonomy; receiving a plurality of documents, each document associated with one of the content entities; extracting features of the received documents; generating by a content classification system, a learned model for assigning taxonomic labels to documents associated with a representative content entity using the features extracted from documents associated with the representative content entity; assigning, by the content classification system, one or more taxonomic labels to each document of the other content entities using the learned model applied to the features extracted from the respective document; and classifying the documents of the plurality of content entities based on the assigned taxonomic labels.
地址 Santa Clara CA US