发明名称 Identifying information related to a particular entity from electronic sources, using dimensional reduction and quantum clustering
摘要 Presented are systems and methods for identifying information about a particular entity including acquiring electronic documents having unstructured text, that are selected based on one or more search terms from a plurality of terms related to the particular entity. Tokenizing the acquired documents to form a data matrix and then calculating a plurality of eigenvectors, using the data matrix and the transpose of the data matrix. The variance is then acquired for determining the amount of intra-clustering between the documents and then the acquired documents are clustered using some of the eigenvectors and the variance.
申请公布号 US8744197(B2) 申请公布日期 2014.06.03
申请号 US201213587520 申请日期 2012.08.16
申请人 Reputation.Com 发明人 Fertik Michael Benjamin Selkowe;Scott Tony;Dignan Thomas
分类号 G06K9/62 主分类号 G06K9/62
代理机构 代理人
主权项 1. A system comprising: one or more servers, each server having a processor and a memory, the one or more servers comprising: a collector module configured to acquire a plurality of eigenvectors, each having a corresponding eigenvalue, wherein the plurality of eigenvectors are based on a plurality of tokenized electronic documents having unstructured text, the plurality of tokenized electronic documents forming a data matrix, andthe unstructured text includes background terms and nonbackground terms; anda dimensional reduction module configured to: classify the plurality of eigenvectors and their corresponding eigenvalues into one or more background eigenvectors and background eigenvalues, and one or more nonbackground eigenvectors and nonbackground eigenvalues, wherein the background eigenvectors correspond to the background terms and the nonbackground eigenvectors correspond to nonbackground terms,acquire a threshold,compare the nonbackground eigenvalues with the threshold, andprovide the nonbackground eigenvectors whose corresponding nonbackground eigenvalues exceed the threshold, wherein the provided nonbackground eigenvectors are used for clustering the plurality of documents.
地址 Redwood City CA US