发明名称 |
LARGE SCALE UNSUPERVISED HIERARCHICAL DOCUMENT CATEGORIZATION USING ONTOLOGICAL GUIDANCE |
摘要 |
A classification method includes constructing queries from category descriptors representing categories of a taxonomy of hierarchically organized categories. The query constructed for a category c includes a query component based on descriptors of the category c and at least one query component based on descriptors of an ancestor or descendant category of the category c. A documents database is queried using the constructed queries to retrieve pseudo-relevant documents. Language models for the categories of the taxonomy are extracted from the pseudo-relevant documents by inferring a hierarchical topic model representing the taxonomy. An input document is classified by optimizing mixture weights of a weighted combination of categories of the hierarchical topic model respective to the input document. |
申请公布号 |
US2012203752(A1) |
申请公布日期 |
2012.08.09 |
申请号 |
US201113022766 |
申请日期 |
2011.02.08 |
申请人 |
HA-THUC VIET;RENDERS JEAN-MICHEL;XEROX CORPORATION |
发明人 |
HA-THUC VIET;RENDERS JEAN-MICHEL |
分类号 |
G06F17/30 |
主分类号 |
G06F17/30 |
代理机构 |
|
代理人 |
|
主权项 |
|
地址 |
|