摘要 |
The present invention provides methods and systems to enable fast, efficient, and scalable means for fingerprinting textual information using word runs. The present system receives textual information and provides algorithms to convert the information into representative fingerprints. In one embodiment, the fingerprints are recorded in a repository to maintain a database of an organization's secure data. In another embodiment, textual information entered by a user is verified against the repository of fingerprints to prevent unauthorized disclosure of secure data. This invention provides approaches to allow derivative works (e.g., different ordering of words, substitution of words with synonyms, etc.) of the original information to be detected at the sentence level or even at the paragraph level. This invention also provides means for enhancing storage and resource efficiencies by providing approaches to optimize the number of fingerprints generated for the textual information. |