发明名称 Anonymization for data having a relational part and sequential part
摘要 A system, method and computer program product for anonymizing data. Datasets anonymized according to the method have a relational part having multiple tables of relational data, and a sequential part having tables of time-ordered data. The sequential part may include data representing a “sequences-of-sequences”. A “sequence-of-sequences” is a sequence which, itself, consists of a number of sequences. Each of these kinds of data may be anonymized using k-anonymization techniques and offers privacy protection to individuals or entities from attackers whose knowledge spans the two (or more) kinds of attribute data.
申请公布号 US9230132(B2) 申请公布日期 2016.01.05
申请号 US201314132945 申请日期 2013.12.18
申请人 International Business Machines Corporation 发明人 Gkoulalas-Divanis Aris;Sauter Guenter A.
分类号 G06F21/00;G06F21/62 主分类号 G06F21/00
代理机构 Scully, Scott, Murphy & Presser, P.C. 代理人 Scully, Scott, Murphy & Presser, P.C. ;Tang, Esq. Jeff
主权项 1. A method of anonymizing data comprising: receiving at a hardware processor, input comprising a dataset having both a relational data part and a sequential data part, the sequential part is data representing a sequence-of-sequences in which a sequence comprises elements that are sequences; identifying from said dataset direct identifier attributes corresponding to entities; masking or suppressing attribute values corresponding to said identified direct identifier attributes; ranking records based on a similarity with respect to a defined cost function F; selecting and iteratively anonymizing each set of at least k first records as ranked using the defined cost function F, each set of at least k records comprising a group, said anonymizing attribute values along both the relational part and the sequential part, wherein k is a specified k-anonymization parameter; and repeating said selecting and iteratively anonymizing each successive set of at least k records of successive groups said anonymizing attribute values along both the relational part and the sequential part of records therein to generate anonymized table representations of said dataset resulting from said anonymization, and outputting said anonymized table representations to an output device, said anonymized table representations guaranteeing no attacker can re-identify the direct identifier attributes of any entity in the dataset with a certain probability.
地址 Armonk NY US