发明名称 Methods and Systems for Identifying a Level of Similarity Between a Filtering Criterion and a Data Item within a Set of Streamed Documents
摘要 A method enables identification of a similarity level between a user-provided data item and a data item within a set of data documents. The method includes a representation generator determining, for each term in an enumeration of terms, occurrence information. The representation generator generates, for each term, a sparse distributed representation (SDR) using the occurrence information. The method includes receiving, by a filtering module, a filtering criterion. The method includes generating, by the representation generator, for the filtering criterion, at least one SDR. The method includes generating, by the representation generator, for a first of a plurality of streamed documents received from a data source, a compound SDR. The method includes determining, by a similarity engine executing on the second computing device, a distance between the filtering criterion SDR and the generated compound SDR. The method includes acting on the first streamed document, based upon the determined distance.
申请公布号 US2017053025(A1) 申请公布日期 2017.02.23
申请号 US201615219851 申请日期 2016.07.26
申请人 cortical.io GmbH 发明人 De Sousa Webber Francisco Eduardo
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项 1. A computer-implemented method for identifying a level of similarity between a user-provided data item and a data item within a set of data documents, the method comprising: clustering, by a reference map generator executing on a first computing device, in a two-dimensional metric space, a set of data documents selected according to at least one criterion, generating a semantic map; associating, by the semantic map, a coordinate pair with each of the set of data documents; generating, by a parser executing on the first computing device, an enumeration of terms occurring in the set of data documents; determining, by a representation generator executing on the first computing device, for each term in the enumeration, occurrence information including—(i) a number of data documents in which the term occurs, (ii) a number of occurrences of the term in each data document, and (iii) the coordinate pair associated with each data document in which the term occurs; generating, by the representation generator, for each term in the enumeration, a sparse distributed representation (SDR) using the occurrence information; storing, in an SDR database, each of the generated SDRs; receiving, by a filtering module executing on a second computing device, from a third computing device, a filtering criterion; generating, by the representation generator, for the filtering criterion, at least one SDR; receiving, by the filtering module, a plurality of streamed documents from a data source; generating, by the representation generator, for a first of the plurality of streamed documents, a compound SDR for a first of the plurality of streamed documents; determining, by a similarity engine executing on the second computing device, a distance between the filtering criterion SDR and the generated compound SDR for the first of the plurality of streamed documents; and acting, by the filtering module, on the first streamed document, based upon the determined distance.
地址 Vienna AT