发明名称 System and method for data quality analysis between untrusted parties
摘要 A system and method for data quality analysis between untrusted parties is provided. A dataset having attributes each associated with one or more elements is maintained. An encrypted request is received from a client regarding data quality for one of the attributes. The encrypted request includes an interest vector of separately encrypted values identifying those elements of interest for the attribute. A condensed data vector representing the elements is generated for the attribute and is the same length as the interest vector. An aggregate of the elements of interest is determined by calculating for each element in the condensed data vector, an encrypted product of that element and a corresponding element of the interest vector and by determining a total product of all the encrypted products. A data quality value is assigned to the elements of the attribute in the dataset based on the aggregate.
申请公布号 US9413760(B2) 申请公布日期 2016.08.09
申请号 US201414479242 申请日期 2014.09.05
申请人 PALO ALTO RESEARCH CENTER INCORPORATED 发明人 Freudiger Julien;Rane Shantanu;Brito Alejandro E.;Uzun Ersin
分类号 G06F17/30;G06F21/60;H04L29/06 主分类号 G06F17/30
代理机构 代理人 Inouye Patrick J. S.;Wittman Krista A.
主权项 1. A system for data quality analysis, comprising: memory storing a dataset comprising attributes each associated with one or more elements; a client comprising: a first vector generating module to generate an interest vector of separately encrypted values identifying elements of interest for at least one attribute;a request module to send an encrypted request to a server, wherein the encrypted request comprises the interest vector; anda first determination module to send an acquisition determination to the server, wherein the acquisition determination is based on a data quality value; and the server, comprising: a receipt module to receive the encrypted request from the client regarding data quality for the at least one attribute;a second vector generating module to generate a condensed data vector representing the elements for the at least one attribute, wherein the condensed data vector is the same length as the interest vector;a condensed data vector module to determine the condensed data vector as one of a counting hashmap when the data quality comprises data completeness and a histogram when the data quality comprises data validity, comprising at least one of: a hashmap module to determine the condensed data vector as the counting hashmap, comprising: a calculation module to calculate a hash value for each of the elements for the at least one attribute;an occupancy determination module to determine a number of times each hash value occurs in the dataset as an occurrence value; andplacement module to place the occurrence values in an element of the vector indexed by the hash values; anda histogram module to determine the condensed data vector as the histogram, comprising: determination module to set a maximum and minimum value for the elements of the at least one attribute;a graph module to generate the histogram based on the set maximum and minimum values for the elements along an x-axis and frequency occurrences of the elements along the y-axis; anda placement module to place the frequency of occurrences along the condensed data vector;an aggregator module to determine an aggregate of the elements of interest by determining for each of the elements in the condensed data vector, an encrypted product of that element and a corresponding element of the interest vector and by calculating the aggregate as an encrypted value by determining a total product of all the encrypted products, wherein the aggregate is used to assign the data quality value to the elements of the at least one attribute in the dataset; anda provider module to provide the dataset based on the acquisition determination.
地址 Palo Alto CA US