发明名称 Procedure for building a max-ARPA table in order to compute optimistic back-offs in a language model
摘要 Each entry of an ARPA table for a modeled language includes an n-gram Az, an associated backoff value Az.p equal to the conditional probability p(z|A) that symbol z follows context A in the modeled language, and an associated backoff weight value Az.b for the context A. A method comprises: (1) computing and adding for each entry of the ARPA table in descending n-gram order an associated maximum backoff weight product value Az.m; (2) after performing operation (1), computing and adding for each entry of the ARPA table in descending n-gram order an associated max-backoff value Az.w=maxh p(z|hA) which is the maximum backoff value for any head h preceding the context A of the n-gram Az; and (3) extending the ARPA table by adding a column storing the associated maximum backoff weight product values Az.m and a column storing the associated max-backoff values Az.w.
申请公布号 US9400783(B2) 申请公布日期 2016.07.26
申请号 US201314089935 申请日期 2013.11.26
申请人 XEROX CORPORATION 发明人 Dymetman Marc
分类号 G06F17/27;G06F17/28 主分类号 G06F17/27
代理机构 Fay Sharpe LLP 代理人 Fay Sharpe LLP
主权项 1. A non-transitory storage medium storing instructions readable and executable by an electronic data processing device to perform a method operating on an ARPA table for a modeled natural language in which each entry of the ARPA table includes an n-gram Az, an associated backoff value Az.p equal to the conditional probability p(z|A) that symbol z follows context A in the modeled natural language, and an associated backoff weight value Az.b for the context A, the method comprising: computing by said electronic data processing device a max-ARPA table from the ARPA table by operations including: computing and adding for each entry of the ARPA table an associated maximum backoff weight product value Az.m wherein the computing and adding of the associated maximum backoff weight product values is performed on the entries of the ARPA table in descending n-gram order; andafter computing and adding the associated maximum backoff weight product values, computing and adding for each entry of the ARPA table an associated max-backoff value Az.w=w(A,z) where w(A,z)=maxh p(z|hA) is the maximum backoff value for any head h preceding the context A of the n-gram Az and the computing and adding of the associated max-backoff values is performed on the entries of the ARPA table in descending n-gram order;wherein each entry of the max-ARPA table includes an n-gram Az and its associated backoff value Az.p, backoff weight value Az.b, maximum backoff weight product value Az.m, and max-backoff value Az.w: and computing by said electronic data processing device a max-backoff value w(A,z) for an n-gram Az of the modeled natural language that is not in the ARPA table by applying the recursive equation:w⁡(A,z)={p⁡(A,z)if⁢⁢Az∉Tm⁢⁢A⁢⁢and⁢⁢A∉Tm⁢⁢Ap⁡(A,z)×A·mif⁢⁢Az∉Tm⁢⁢A⁢⁢and⁢⁢A∈Tm⁢⁢AAz·wif⁢⁢Az∈Tm⁢⁢Awhere the values A.m and Az.w are obtained from the .m and .w columns of the max-ARPA table TmA, respectively, and p(A,z) is computed from the .p and .b columns of the max-ARPA table.
地址 Norwalk CT US