发明名称 Classifying Samples Using Clustering
摘要 An unlabeled sample is classified using clustering. A set of samples containing labeled and unlabeled samples is established. Values of features are gathered from the samples contained in the datasets and a subset of features are selected. The labeled and unlabeled samples are clustered together based on similarity of the gathered values for the selected subset of features to produce a set of clusters, each cluster having a subset of samples from the set of samples. The selecting and clustering steps are recursively iterated on the subset of samples in each cluster in the set of clusters until at least one stopping condition is reached. The iterations produce a cluster having a labeled sample and an unlabeled sample. A label is propagated from the labeled sample in the cluster to the unlabeled sample in the cluster to classify the unlabeled sample.
申请公布号 US2014201208(A1) 申请公布日期 2014.07.17
申请号 US201313742218 申请日期 2013.01.15
申请人 Satish Sourabh;Salinas Govind;Cheong Vincent;Symantec Corporation 发明人 Satish Sourabh;Salinas Govind;Cheong Vincent;Symantec Corporation
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项 1. A computer-implemented method of classifying a sample, comprising: establishing a set of samples containing labeled and unlabeled samples; gathering values of features from the labeled and unlabeled samples; selecting a subset of the features; clustering the labeled and unlabeled samples together based on similarity of the gathered values of the selected subset of features to produce a set of clusters, each cluster having a subset of samples from the set of samples; recursively iterating the selecting and clustering steps on the subset of samples in each cluster in the set of clusters until at least one stopping condition is reached, the iterations producing a cluster having a labeled sample and an unlabeled sample; and propagating a label from the labeled sample in the cluster to the unlabeled sample in the cluster to classify the unlabeled sample.
地址 Fremont CA US