发明名称 DISTRIBUTIONAL ALIGNMENT OF SETS
摘要 Technology for classifying a data set includes extracting one or more features from items of the data set, computing a specificity measure for the extracted features, and measuring the similarity of the extracted features to a set of characteristic features associated with the property of one or more reference models.
申请公布号 US2016378847(A1) 申请公布日期 2016.12.29
申请号 US201514862101 申请日期 2015.09.22
申请人 SRI International 发明人 Byrnes John;Freyman Christina
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项 1. A method for semantically aligning a data set with a different data set, the data set comprising a plurality of documents, the data set represented by a plurality of document clusters and a plurality of term clusters, the document clusters and the term clusters algorithmically derived from the data set, the document clusters each comprising at least one document and the term clusters each comprising at least one term, the method comprising, by a computing system comprising one or more computing devices: creating a specificity-weighted semantic representation of the data set by, for each document cluster-term cluster pair of the data set: (i) computing a specificity measure indicative of a likelihood of occurrence of the term cluster in the document cluster in relation to a likelihood of occurrence of the term cluster in the other document clusters of the data set and (ii) associating each specificity measure with its respective document cluster-term cluster pair; accessing a semantic representation of the different data set, the different data set represented by a different plurality of document clusters and a different plurality of term clusters, the semantic representation of the different data set comprising, for each document cluster in the different data set, at least data indicative of a likelihood of occurrence of each of the different plurality of term clusters in the document cluster; with the specificity measures, algorithmically comparing the semantic representation of the data set to the semantic representation of the different data set; selecting a subset of the term clusters of the data set based on the comparison of the semantic representation of the data set to the semantic representation of the different data set; and associating the selected subset of term clusters with one or more documents of the data set.
地址 Menlo Park CA US