发明名称 Clustering Classes in Language Modeling
摘要 This document describes, among other things, a computer-implemented method. The method can include obtaining a plurality of text samples that each include one or more terms belonging to a first class of terms. The plurality of text samples can be classified into a plurality of groups of text samples. Each group of text samples can correspond to a different sub-class of terms. For each of the groups of text samples, a sub-class context model can be generated based on the text samples in the respective group of text samples. Particular ones of the sub-class context models that are determined to be similar can be merged to generate a hierarchical set of context models. Further, the method can include selecting particular ones of the context models and generating a class-based language model based on the selected context models.
申请公布号 US2016062985(A1) 申请公布日期 2016.03.03
申请号 US201514656027 申请日期 2015.03.12
申请人 Google Inc. 发明人 Epstein Mark Edward;Schogol Vladislav
分类号 G06F17/27;G06F17/30 主分类号 G06F17/27
代理机构 代理人
主权项 1. A computer-implemented method, comprising: obtaining a plurality of text samples that each include one or more terms belonging to a first class of terms; classifying the plurality of text samples into a plurality of groups of text samples, each group of text samples corresponding to a different sub-class of terms such that the one or more terms belonging to the first class of terms in the respective text samples for each group of text samples belong to the corresponding sub-class of terms for the respective group of text samples; for each of the groups of text samples, generating a sub-class context model based on the text samples in the respective group of text samples; merging particular ones of the sub-class context models that are determined to be similar to generate a hierarchical set of context models; selecting particular ones of the context models from among the hierarchical set of context models; and generating a class-based language model that includes, for each of the selected context models, a class that corresponds to the respective context model.
地址 Mountain View CA US