发明名称 Data leak prevention enforcement based on learned document classification
摘要 The present disclosure relates generally to the field of automatically learning and automatically adapting to perform classification of protected data. In various examples, learning and adapting to perform classification of protected data may be implemented in the form of systems, methods and/or algorithms.
申请公布号 US9626528(B2) 申请公布日期 2017.04.18
申请号 US201414201107 申请日期 2014.03.07
申请人 International Business Machines Corporation 发明人 Butler Anthony M.
分类号 G06F15/18;G06F21/62;G06N99/00;G06N5/02 主分类号 G06F15/18
代理机构 Scully, Scott, Murphy & Presser, P.C. 代理人 Scully, Scott, Murphy & Presser, P.C. ;Kaschak Ronald
主权项 1. An automated method for data leak prevention, the method comprising: obtaining, by a processor, a plurality of training documents and corresponding meta data associated with each training document from a document management system associated with a party, each of the training documents comprising at least one respective content, the corresponding metadata associated with each training document comprising a security classification set by the party in the document management system, the security classification classifying the training document associated with the corresponding metadata into one of at least two security categories; in response to obtaining the plurality of training documents from the document management system, converting each training document into a feature set comprising at least one pairing of a feature of the respective content of the respective training document with the security classification of the respective training document found in the corresponding metadata associated with the respective training document; generating, by the processor, a classification model based at least in part upon the pairings found in the feature sets of each of the training documents, wherein the generated classification model comprises at least one correlation between the features found in the respective content of each training document and the security classification found in the corresponding metadata associated with each training document; obtaining, by the processor, at least one non-training document, wherein the at least one non-training document comprises at least one respective content; in response to obtaining the at least one non-training document, applying, by the processor, the generated classification model to the at least one non-training document, the application of the classification model to the at least one non-training document comprising: correlating the at least one respective content of the at least one non-training document to a security classification of the at least one non-training document based on the at least one correlation in the generated classification model; andclassifying the at least one non-training document into one of the at least two security categories based on the correlation of the at least one respective content of the at least one non-training document to the security classification; monitoring the at least one non-training document, by the processor, for attempted access to the at least one non-training document; detecting, by the processor, based on the monitoring, an attempted access to the at least one non-training document; in response to detecting an attempted access to the at least one non-training document, taking, by the processor, a predetermined action; wherein the predetermined action that is taken is based upon the one of the at least two categories into which the at least one non-training document has been classified by the application of the generated classification model; and wherein the predetermined action that is taken comprises one of: (a) denying access to the at least one non-training document to which access is attempted; (b) logging the attempted access to the at least one non-training document to which access is attempted; and (c) a combination thereof.
地址 Armonk NY US