摘要 |
Detecting confidential information includes reading stored data and identifying strings within the stored data (210), where each string includes a sequence of consecutive bytes which all have values that are in a predetermined subset of possible values. For each of at least some of the strings, determining if the string includes bytes representing one or more format matches (220 - 270), wherein a format match includes a set of values that match a predetermined format associated with confidential information. For each format match, testing the values that match the predetermined format with a set of rules associated with the confidential information to determine whether the format match is an invalid format match that includes one or more invalid values and calculating a score for the stored data (280, 300), based at least in part upon the ratio of a count of invalid format matches to a count of other format matches. |