发明名称 DETECTING QUASI-IDENTIFIERS IN DATASETS
摘要 Quasi-identifiers (QIDs) are detected in a dataset using a set of computing tasks. The dataset has a plurality of records and a set of attributes. An index is generated for the dataset. The index has an indicator for each attribute value of each record in the dataset. Each indicator specifies all the records in the dataset having the same value for the attribute. Each task is assigned an attribute combination and a subset of the plurality of records in the dataset and is passed to a thread for execution on computing resources. The executing task inspects the set of records specified by the index indicator for each attribute value in the attribute combination to produce a result. The result of at least one task identifies a unique record for the associated attribute combination. The attribute combination producing the unique record is a QID.
申请公布号 US2016342637(A1) 申请公布日期 2016.11.24
申请号 US201615193536 申请日期 2016.06.27
申请人 International Business Machines Corporation 发明人 Braghin Stefano;Gkoulalas-Divanis Aris;Wurst Michael
分类号 G06F17/30;G06F9/50 主分类号 G06F17/30
代理机构 代理人
主权项 1. A method for detecting quasi-identifiers in a dataset using a set of computing tasks, the dataset having a plurality of records and further having a set of attributes, each record having an attribute value for each attribute in the set of attributes, the method comprising: generating a first index for the dataset, the first index having an index indicator for each attribute value of each record, each index indicator specifying a set of records, the specified set of records including each record in the plurality of records having the same attribute value for the associated attribute as the associated record; assigning an attribute combination to each task in the set of computing tasks, the attribute combination for each task including one or more attributes of the set of attributes; assigning a subset of the plurality of records to each task in the set of computing tasks; detecting at least one quasi-identifier by passing each task to at least one thread for execution on computing resources, the execution of each task comprising inspecting the index indicator for each attribute value in the assigned attribute combination of at least a portion of the assigned subset of the plurality of records to produce a result, the result of at least one task identifying a unique record for the associated attribute combination, the attribute values in the attribute combination for the unique record different from the attribute values in the attribute combination for all other records in the plurality of records, the at least one quasi-identifier being the attribute combination assigned to the at least one task identifying a unique record.
地址 Armonk NY US