主权项 |
1. A computer-implemented method performed by data processing apparatus, the method comprising:
receiving, by a data processing apparatus, an electronic document data collection generated from a first set of documents, the document data collection including a first set of fixed phrases extracted from the first set of documents, wherein each fixed phrase is a phrase of one or more terms that is determined to not present a personal information exposure risk, and wherein access to the document data collection for examination by a human reviewer is precluded; receiving, by the data processing apparatus, a second set of documents, the second set of documents including documents that are each a personal document of a user that has personal information of the user and for which the user has provided permission to use the document for processing of the fixed phrases extracted from the first set of documents; extracting, by the data processing apparatus, candidate phrases from the second set of documents, each candidate phrase being a phrase of one or more terms; identifying, by the data processing apparatus, fixed phrases extracted from the first set of documents that match candidate phrases extracted from the second set of documents; generating, from the document data collection, a redacted document data collection in which each fixed phrase that does not match a candidate phrase is redacted, and each fixed phrase that does match a candidate phrase is not redacted; and providing, by the data processing apparatus, access to the redacted document data collection for examination by a human reviewer. |