发明名称 INFORMATION REDACTION FROM DOCUMENT DATA
摘要 Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for redacting data from a document collection generated for a set of documents that include personal information. The redaction of the data is based in part on a comparison of the document collection to a set of a personal documents of users for which the users have provided explicit approval to use in the processing of the document collection.
申请公布号 US2016110352(A1) 申请公布日期 2016.04.21
申请号 US201414520018 申请日期 2014.10.21
申请人 Google Inc. 发明人 Bendersky Mike;Josifovski Vanja;Saikia Amitabh;Cartright Marc-Allen;Yang Jie;Pueyo Luis Garcia;Yang MyLinh
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项 1. A computer-implemented method performed by data processing apparatus, the method comprising: receiving, by a data processing apparatus, an electronic document data collection generated from a first set of documents, the document data collection including a first set of fixed phrases extracted from the first set of documents, wherein each fixed phrase is a phrase of one or more terms that is determined to not present a personal information exposure risk, and wherein access to the document data collection for examination by a human reviewer is precluded; receiving, by the data processing apparatus, a second set of documents, the second set of documents including documents that are each a personal document of a user that has personal information of the user and for which the user has provided permission to use the document for processing of the fixed phrases extracted from the first set of documents; extracting, by the data processing apparatus, candidate phrases from the second set of documents, each candidate phrase being a phrase of one or more terms; identifying, by the data processing apparatus, fixed phrases extracted from the first set of documents that match candidate phrases extracted from the second set of documents; generating, from the document data collection, a redacted document data collection in which each fixed phrase that does not match a candidate phrase is redacted, and each fixed phrase that does match a candidate phrase is not redacted; and providing, by the data processing apparatus, access to the redacted document data collection for examination by a human reviewer.
地址 Mountain View CA US