发明名称 Network analysis
摘要 Methods and a device for providing a compressed index of binary records. A method includes: sorting the records by content of a predetermined field of the record, indexing the field from one of the records in a line of a bitmap index, compressing bits in a column of the bitmap index by replacing a group of successive bits with a code, where the sorting includes the steps of assigning, for each record, a hash bucket of a hash table on a basis of a locality sensitive hash function on the contents of the predetermined field, so that the probability for two of the records to be assigned to the same has bucket increases with the similarity of the contents of the predetermined field between the records, and where at least one step of the computer implemented method is executed on a computer device.
申请公布号 US8782012(B2) 申请公布日期 2014.07.15
申请号 US201113218566 申请日期 2011.08.26
申请人 International Business Machines Corporation 发明人 Fusco Francesco;Stoecklin Marc P;Vlachos Michail
分类号 G06F7/00;G06F17/00;G06F17/30 主分类号 G06F7/00
代理机构 Cantor Colburn LLP 代理人 Cantor Colburn LLP
主权项 1. A computer implemented method for providing a compressed index for a stream of binary records, the method comprising the steps of: sorting the stream of binary records by content of a predetermined field of each of the binary records; transforming the predetermined field from each of the binary records into a bitmap index, wherein the bitmap index is a matrix having a separate column corresponding to each possible value in the predetermined field; compressing bits in a column of the bitmap index by replacing a group of successive bits with a code; wherein the sorting comprises the step of assigning, for each binary record, a hash bucket of a hash table on a basis of a locality sensitive hash function on the contents of the predetermined field, so that the probability for two of the binary records to be assigned to the same hash bucket increases with the similarity of the contents of the predetermined field between the binary records; wherein if a total number of binary records held in the hash table exceeds a first predetermined number, the hash bucket that is assigned greatest number of binary records is output to an output stream until the total number of binary records held in the hash table falls below a second predetermined number; wherein if a number of binary records held in one of the hash buckets exceeds a third predetermined number, the binary records in that hash bucket are output to the output stream; and wherein at least one step of the computer implemented method is executed on a computer device.
地址 Armonk NY US