发明名称 METHOD TO MANAGE RAW GENOMIC DATA IN A PRIVACY PRESERVING MANNER IN A BIOBANK
摘要 A method to manage raw genomic data (SAM/BAM files) in a privacy preserving manner in a biobank. By using order preserving encryption of the reads' positions, the method provides a requested range of nucleotides to a medical unit, without revealing the locations of the short reads (which include the requested nucleotides) to the biobank. The method prevents the leakage of extra information in the short reads to the medical unit by masking the encrypted short reads at the biobank. That is, specific parts of the genomic data for which the medical unit is not authorized or the patient prefers to keep secret are masked at the biobank, without revealing any information to the biobank.
申请公布号 US2016275308(A1) 申请公布日期 2016.09.22
申请号 US201414899999 申请日期 2014.06.17
申请人 SOPHIA GENETICS S.A. 发明人 HUBAUX Jean-Pierre;AYDAY Erman;RAISARO Jean-Louis;HENGARTNER Urs;MOLYNEAUX Adam;Xu Zhenyu;CAMBLONG Jurgi;HUTTER Pierre
分类号 G06F21/62;H04L9/06;G06F19/28 主分类号 G06F21/62
代理机构 代理人
主权项 1. Method to manage raw genomic data in a privacy preserving manner in a biobank, said raw genomic data comprising a plurality of aligned short reads of a reference DNA sequence, each short read having a position in the reference DNA sequence and comprising at least a plurality of nucleotides, the position and a cigar string, said method comprising an encryption and storage stage, carried out by a certified institution (CI), comprising the steps of: encrypting, for each short read, the position with an order preserving encryption algorithm, encrypting, for each short read, the cigar string with a symmetric encryption algorithm, encrypting the nucleotides with a stream cipher algorithm, storing all the encrypted data in the biobank together with a patient identification,the management of the raw genomic data comprising an access stage to the raw genomic data comprising the steps of: receiving a request by the biobank, from a medical unit (MU), comprising a patient identification and at least one specific range of nucleotides, each range being defined by a lower and an upper bound and comprising at least one short read having a maximum length, each range comprising a first and a second value allowing to determine the range, the first value being either the encrypted lower bound of the specific range or an encrypted adjusted lower bound defining an adjusted range in which the lower bound is included based on the maximum length of a short read, and the second value being the encrypted upper bound of the specific range, said first and second values having been encrypted by the medical unit (MU) with the order preserving encryption algorithm, in case that the first value is the encrypted lower bound, determining the encrypted adjusted lower bound in which the encrypted lower bound is included based on the maximum length of a short read, retrieving by the biobank at least one short read having an encrypted position within the encrypted adjusted lower bound and the encrypted upper bound, transmitting the at least one short read to a key manager (MK), decrypting the first and second values by the key manager (MK),in case that the first value is the encrypted adjusted lower bound, determining by the key manager (MK), the lower bound with the adjusted lower bound and the maximum length of a short read, transmitting the lower and upper bond to the biobank by the key manager (MK), masking, by the biobank, the nucleotides of the retrieved short read outside the range defined by the lower and upper bound, providing the selectively masked short read for further analysis to the medical unit (MU).
地址 Ecublens CH