发明名称 System and Method To Label Unlabeled Data
摘要 In accordance with an embodiment of the invention, there is provided a technique for permitting a machine to discover classes and topics that data contains and to annotate data objects with those identified classes. The technique enables machines to group and annotate data objects in ways that are meaningful and intuitive for a user of the data objects. An interactive method uses clustering, along with feedback from a user on the clustering output, to discover a set of classes. The feedback from the user is used to guide the clustering process in the later stages, which results in better and better discovery of classes and annotation with more and more human feedback. A method can be used to produce labeled data that involves discovering classes and annotating a given dataset with the discovered class labels. This is advantageous for building a classifier that has wide applications, such as call routing and intent discovery.
申请公布号 US2014188881(A1) 申请公布日期 2014.07.03
申请号 US201213731651 申请日期 2012.12.31
申请人 NUANCE COMMUNICATIONS, INC. 发明人 Joshi Sachindra;Godbole Shantanu Ravindra;Verma Ashish
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项 1. A device for labeling unlabeled data, the device comprising: a clustering processor configured to group the unlabeled data to produce at least one data group; a feedback processor configured to enable a user to provide feedback on the at least one data group, the feedback including at least one of the following: (i) feedback on membership of a data object in a data group, and (ii) feedback on a current labeling of a data group; the clustering processor further configured to regroup the at least one data group using at least one constraint based on the feedback provided by the user; and the feedback processor further configured to apply a label to a data group of the at least one data group to produce at least one labeled data group, the label based on the feedback provided by the user after the grouping or on feedback provided by the user after at least one regrouping by the clustering processor.
地址 Burlington MA US