发明名称 Automatic labeling of unlabeled text data
摘要 A method of automatically labeling of unlabeled text data can be practiced independent of human intervention, but that does not preclude manual intervention. The method can be used to extract relevant features of unlabeled text data for a keyword search. The method of automated labeling of unlabeled text data uses a document collection as a reference answer set. Members of the answer set are converted to vectors representing centroids of unknown groups of unlabeled text data. Unlabeled text data are clustered relative to the centroids by a nearest neighbor algorithm and the ID of the relevant answer is assigned to all documents in the cluster. At this point in the process, a supervised machine learning algorithm is trained on labeled data, and a classifier for assigning labels to new text data is output. Alternatively, a feature extraction algorithm may be run on classes generated by the step of clustering, and search features output which index the unlabeled text data.
申请公布号 US6697998(B1) 申请公布日期 2004.02.24
申请号 US20000591497 申请日期 2000.06.12
申请人 INTERNATIONAL BUSINESS MACHINES CORPORATION 发明人 DAMERAU FREDERICK J.;JOHNSON DAVID E.;BUSKIRK, JR. MARTIN C.
分类号 G06F17/21;G06F17/30;(IPC1-7):G06F17/21 主分类号 G06F17/21
代理机构 代理人
主权项
地址