发明名称 Identifying confidential data in a data item by comparing the data item to similar data items from alternative sources
摘要 A method, apparatus and computer program product to identify confidential information in a document. To examine a document for inclusion of confidential information, the document is compared against documents having similar structure and content from one or more other sources. When comparing documents (of similar structure and content) from different sources, confidential information is then made to stand out by searching for terms (from the sources) that are not shared between or among them. In contrast, common words or terms that are shared across the sources are ignored as likely being non-confidential information; what remains as not shared may then be classified as confidential information and protected accordingly (e.g., by omission, redaction, substitution or the like). Using this technique, non-confidential information may be safely segmented from confidential information in a dynamic, automated manner.
申请公布号 US9489376(B2) 申请公布日期 2016.11.08
申请号 US201313732501 申请日期 2013.01.02
申请人 International Business Machines Corporation 发明人 Thomason Michael Scott;Arun Jai S.;Myers Benjamin L.;Rotermund Chad C.;Jariwala Ajit J.
分类号 G06F17/27;G06F21/62 主分类号 G06F17/27
代理机构 代理人 Wilhelm Richard A.;Judson David H.
主权项 1. A method of identifying potential confidential information in a data item, the data item associated with a source, comprising: obtaining, from each of a set of alternative sources, a data item of a same type and format as the data item; comparing, using a hardware element, the data item to the data item(s) obtained from the set of alternative sources to identify occurrences of particular pieces of information in the data item, wherein multiple occurrences of a particular piece of information within a data item from each alternative source are treated as a single occurrence; and based on the occurrences of particular pieces of information in the data item and a given sensitivity criteria, and without knowledge that the particular pieces of information are considered by the source to be confidential, segmenting one or more pieces of information in the data item as representing the potential confidential information.
地址 Armonk NY US