发明名称 Communication and message-efficient protocol for computing the intersection between different sets of data
摘要 Embodiments relate to data processing. A method includes analyzing a plurality of data items in a relational database, where different portions of the data items are stored in a plurality of servers. The method also includes determining a maximum size of a subset of the data items stored in each of at least two servers among the plurality of servers, calculating a logarithm function based on the maximum size of the subset of the data items in each of the two servers, and calculating a highest number of sequences of communications between the two servers such that when the logarithmic function is iteratively applied, a value of the logarithmic function remains smaller than one. A protocol is then generated between the two servers for performing an intersection operation using the highest number of sequences calculated.
申请公布号 US9438705(B2) 申请公布日期 2016.09.06
申请号 US201314107097 申请日期 2013.12.16
申请人 INTERNATIONAL BUSINESS MACHINES CORPORATION 发明人 Woodruff David P.;Yaroslavtsev Grigory
分类号 G06F7/04;G06F17/30;H04L29/06 主分类号 G06F7/04
代理机构 Cantor Colburn LLP 代理人 Cantor Colburn LLP
主权项 1. A method comprising: analyzing a plurality of data items in a relational database, wherein different portions of the data items are stored in a plurality of servers; determining a maximum size of a subset of the data items stored in each of at least two servers among the plurality of servers; calculating a logarithm function based on the maximum size of the subset of the data items in each of the at least two servers; calculating a highest number of sequences of communications between the at least two servers such that when the logarithmic function is iteratively calculated, a result of the logarithmic function remains smaller than 1; and generating a protocol between the two at least servers for performing an intersection operation using the highest number of sequences calculated, wherein the intersection operation determines common data items stored in the at least two servers through communications between the at least two servers such that the number of times the communications are exchanged does not exceed the highest number of sequences calculated.
地址 Armonk NY US