发明名称 SYSTEM AND METHOD FOR ADDING NOISE TO n-GRAM STATISTICS
摘要 A system and method modify n-gram statistics to allow their release by inhibiting reconstruction of a sequence from which they are derived. n-gram statistics for the sequence are obtained which include, for each of a set of n-grams, an associated measure of occurrence in the sequence. An initial directed graph is generated from the n-gram statistics. The graph includes nodes connected by edges, each of the edges corresponding to one of the n-grams in the set of n-grams. The edge is associated with a multiplicity which is based on the measure of occurrence. A modified directed graph is generated. This includes adding a plurality of edges to the initial directed graph. These added edges correspond to n-grams that are not present in the sequence of symbols and are each associated with a multiplicity. Modified n-gram statistics for the modified directed graph are generated. The modified n-gram statistics include, for n-grams represented in the modified directed graph, an associated measure of occurrence.
申请公布号 US2016342706(A1) 申请公布日期 2016.11.24
申请号 US201514714567 申请日期 2015.05.18
申请人 Xerox Corporation 发明人 Gallé Matthias
分类号 G06F17/30;G06F17/18 主分类号 G06F17/30
代理机构 代理人
主权项 1. A method for modifying n-gram statistics comprising: obtaining n-gram statistics for a sequence of symbols, the n-gram statistics comprising, for each of a set of n-grams present in the sequence, an associated measure of occurrence in the sequence; generating an initial directed graph from the n-gram statistics, the initial directed graph including nodes connected by edges, each of the edges corresponding to one of the n-grams in the set of n-grams and being associated with a multiplicity which is based on the measure of occurrence; generating a modified directed graph comprising adding a plurality of edges to the initial directed graph, the plurality of added edges corresponding to n-grams that are not present in the sequence of symbols and being each associated with a multiplicity; and generating modified n-gram statistics for the modified directed graph, the modified n-gram statistics comprising, for n-grams represented in the modified directed graph, an associated measure of occurrence, wherein at least one of the generating an initial directed graph, generating a modified directed graph, and generating modified n-gram statistics from the modified graph is performed with a processor.
地址 Norwalk CT US