发明名称 MACHINE LEARNING DIALECT IDENTIFICATION
摘要 Technology is disclosed for creating and tuning classifiers for language dialects and for generating dialect-specific language modules. A computing device can receive an initial training data set as a current training data set. The selection process for the initial training data set can be achieved by receiving one or more initial content items, establishing dialect parameters of each of the initial content items, and sorting each of the initial content items into one or more dialect groups based on the established dialect parameters. The computing device can generate, based on the initial training data set, a dialect classifier configured to detect language dialects of content items to be classified. The computing device can augment the current training data set with additional training data by applying the dialect classifier to candidate content items. The computing device can then update the dialect classifier based on the augmented current training data set.
申请公布号 US2017011739(A1) 申请公布日期 2017.01.12
申请号 US201615275235 申请日期 2016.09.23
申请人 Facebook, Inc. 发明人 Huang Fei
分类号 G10L15/06;G06F17/27 主分类号 G10L15/06
代理机构 代理人
主权项 1. A method, comprising: selecting, by a computing device, an initial training data set as a current training data set, wherein the initial training data set is selected by: receiving one or more initial content items; andestablishing dialect parameters of one or more of the initial content items; generating, by the computing device and based on the initial training data set, a dialect classifier configured to detect language dialects of content items to be classified; augmenting, by the computing device, the current training data set with additional training data by applying the dialect classifier to candidate content items; and updating the dialect classifier based on the augmented current training data set.
地址 Menlo Park CA US