发明名称 COMMUNICATION AND MESSAGE-EFFICIENT PROTOCOL FOR COMPUTING THE INTERSECTION BETWEEN DIFFERENT SETS OF DATA
摘要 Embodiments relate to data processing. A method includes analyzing a plurality of data items in a relational database, where different portions of the data items are stored in a plurality of servers. The method also includes determining a maximum size of a subset of the data items stored in each of at least two servers among the plurality of servers, calculating a logarithm function based on the maximum size of the subset of the data items in each of the two servers, and calculating a highest number of sequences of communications between the two servers such that when the logarithmic function is iteratively applied, a value of the logarithmic function remains smaller than one. A protocol is then generated between the two servers for performing an intersection operation using the highest number of sequences calculated.
申请公布号 US2016173660(A1) 申请公布日期 2016.06.16
申请号 US201615064187 申请日期 2016.03.08
申请人 International Business Machines Corporation 发明人 Woodruff David P.;Yaroslavtsev Grigory
分类号 H04L29/06;H04L29/08 主分类号 H04L29/06
代理机构 代理人
主权项 1. A method comprising: analyzing a plurality of data items in a relational database, wherein different portions of the data items are stored in a plurality of servers; determining a maximum size of a subset of the data items stored in each of at least two servers among the plurality of servers; calculating a logarithm function based on the maximum size of the subset of the data items in each of the two servers, wherein the logarithmic function includes an upper bound and a lower bound in its functional calculation; calculating a highest number of sequences of communications between the two servers such that when the logarithmic function is iteratively applied, a value of the logarithmic function remains smaller than 1, wherein the highest number of sequences is defined as O(k) bits where k is the maximum size of the subset of the data items in each of the two servers, wherein the maximum size of the subset of the data items is defined as n and the logarithmic function is defined as O(k log n) bits of communication, wherein the value of n is alterable when at least one of a new data item is added and when an existing data item is removed from the relational database such that O(k log n) is further calculated by the logarithmic function O(log*k) for the two servers, wherein the two servers deterministically exchange their inputs using only O(k log(n/k)) bits of communication, wherein the two servers hash their data items in elements of O(log k)-bit strings, and wherein the two servers exchange the hashed values in order to determine which elements are in the intersection of their data items; and generating a protocol between the two servers for performing an intersection operation using the highest number of sequences calculated, wherein the intersection operation determines common data items stored in the two servers through the communications between the two servers such that the number of times the communications are exchanged does not exceed the highest number of sequences calculated, and wherein the intersection operation is used to compute a number of distinct elements in a union of the data items between the two servers.
地址 Armonk NY US