发明名称 N-Gram Selection for Practical-Sized Language Models
摘要 Described is a technology by which a statistical N-gram (e.g., language) model is trained using an N-gram selection technique that helps reduce the size of the final N-gram model. During training, a higher-order probability estimate for an N-gram is only added to the model when the training data justifies adding the estimate. To this end, if a backoff probability estimate is within a maximum likelihood set determined by that N-gram and the N-gram's associated context, or is between the higher-order estimate and the maximum likelihood set, then the higher-order estimate is not included in the model. The backoff probability estimate may be determined via an iterative process such that the backoff probability estimate is based on the final model rather than any lower-order model. Also described is additional pruning referred to as modified weighted difference pruning.
申请公布号 US2011224971(A1) 申请公布日期 2011.09.15
申请号 US20100722522 申请日期 2010.03.11
申请人 MICROSOFT CORPORATION 发明人 MOORE ROBERT CARTER
分类号 G06F17/27 主分类号 G06F17/27
代理机构 代理人
主权项
地址