摘要 |
A system and method modify n-gram statistics to allow their release by inhibiting reconstruction of a sequence from which they are derived. n-gram statistics for the sequence are obtained which include, for each of a set of n-grams, an associated measure of occurrence in the sequence. An initial directed graph is generated from the n-gram statistics. The graph includes nodes connected by edges, each of the edges corresponding to one of the n-grams in the set of n-grams. The edge is associated with a multiplicity which is based on the measure of occurrence. A modified directed graph is generated. This includes adding a plurality of edges to the initial directed graph. These added edges correspond to n-grams that are not present in the sequence of symbols and are each associated with a multiplicity. Modified n-gram statistics for the modified directed graph are generated. The modified n-gram statistics include, for n-grams represented in the modified directed graph, an associated measure of occurrence. |
主权项 |
1. A method for modifying n-gram statistics comprising:
obtaining n-gram statistics for a sequence of symbols, the n-gram statistics comprising, for each of a set of n-grams present in the sequence, an associated measure of occurrence in the sequence; generating an initial directed graph from the n-gram statistics, the initial directed graph including nodes connected by edges, each of the edges corresponding to one of the n-grams in the set of n-grams and being associated with a multiplicity which is based on the measure of occurrence; generating a modified directed graph comprising adding a plurality of edges to the initial directed graph, the plurality of added edges corresponding to n-grams that are not present in the sequence of symbols and being each associated with a multiplicity; and generating modified n-gram statistics for the modified directed graph, the modified n-gram statistics comprising, for n-grams represented in the modified directed graph, an associated measure of occurrence, wherein at least one of the generating an initial directed graph, generating a modified directed graph, and generating modified n-gram statistics from the modified graph is performed with a processor. |