发明名称 CREATING A PRELIMINARY TOPIC STRUCTURE OF A CORPUS WHILE GENERATING THE CORPUS
摘要 Disclosed are systems, computer-readable mediums, and methods for creating a topic structure of a corpus while constructing the corpus. A first set of documents is received, and each document is converted into a text representation. The text representation of the first set of documents is clustered into original topics. Each document in the first set of documents is labeled based upon the clustering of the first set of documents. A classifier is built based on the labeling of each document in the first set of documents. A second set of documents is received, and each document in the second set of documents is classified, using the classifier, into one or more topics from the original topics.
申请公布号 US2015169593(A1) 申请公布日期 2015.06.18
申请号 US201414508228 申请日期 2014.10.07
申请人 ABBYY InfoPoisk LLC 发明人 Bogdanova Daria Nikolaevna;Kopylov Nikolay Yurievich
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项 1. A method for creating a topic structure of a corpus while constructing the corpus, the method comprising: receiving a first set of documents; clustering the text representation of the first set of documents into original topics; labeling each document in the first set of documents based upon the clustering of the first set of documents; building, using a processor, a classifier based on the labeling of each document in the first set of documents; receiving a second set of documents; and classifying, using the classifier, each document in the second set of documents into one or more topics from the original topics.
地址 Moscow RU