发明名称 |
Determining Journalist Risk of a Dataset Using Population Equivalence Class Distribution Estimation |
摘要 |
A system, method and computer readable memory for determining journalist risk of a dataset using population equivalence class distribution estimation. The dataset may be a cross-sectional data set or a longitudinal dataset. The determine risk of identification can be determined and used in de-identification process of the dataset. |
申请公布号 |
US2016155061(A1) |
申请公布日期 |
2016.06.02 |
申请号 |
US201514953195 |
申请日期 |
2015.11.27 |
申请人 |
Privacy Analytics Inc. |
发明人 |
Korte Stephen;Arbuckle Luk;Baker Andrew;El Emam Khaled;Rose Sean |
分类号 |
G06N7/00 |
主分类号 |
G06N7/00 |
代理机构 |
|
代理人 |
|
主权项 |
1. A computer implemented method of determining journalist risk associated with a dataset, the method comprising:
retrieving the dataset containing a plurality of records containing personal data, the dataset representing a sample of individuals and data associated with a larger population; determining sample equivalence class (EC) distribution of the dataset; equating a population EC distribution to the determined sample EC distribution; calculating probability that an EC in the dataset of size x came from population of size y for all x and y; and calculating the journalist risk measurement using calculated probability; wherein the equivalence classes define a collection of all records in the dataset containing identical values for all quasi-identifiers in the data. |
地址 |
Ottawa CA |