发明名称 Determining Journalist Risk of a Dataset Using Population Equivalence Class Distribution Estimation
摘要 A system, method and computer readable memory for determining journalist risk of a dataset using population equivalence class distribution estimation. The dataset may be a cross-sectional data set or a longitudinal dataset. The determine risk of identification can be determined and used in de-identification process of the dataset.
申请公布号 US2016155061(A1) 申请公布日期 2016.06.02
申请号 US201514953195 申请日期 2015.11.27
申请人 Privacy Analytics Inc. 发明人 Korte Stephen;Arbuckle Luk;Baker Andrew;El Emam Khaled;Rose Sean
分类号 G06N7/00 主分类号 G06N7/00
代理机构 代理人
主权项 1. A computer implemented method of determining journalist risk associated with a dataset, the method comprising: retrieving the dataset containing a plurality of records containing personal data, the dataset representing a sample of individuals and data associated with a larger population; determining sample equivalence class (EC) distribution of the dataset; equating a population EC distribution to the determined sample EC distribution; calculating probability that an EC in the dataset of size x came from population of size y for all x and y; and calculating the journalist risk measurement using calculated probability; wherein the equivalence classes define a collection of all records in the dataset containing identical values for all quasi-identifiers in the data.
地址 Ottawa CA