主权项 |
1. A method, comprising:
selecting, by a computing device, an initial training data set as a current training data set, wherein the initial training data set is selected by:
receiving one or more initial content items; andestablishing dialect parameters of one or more of the initial content items; generating, by the computing device and based on the initial training data set, a dialect classifier configured to detect language dialects of content items to be classified; augmenting, by the computing device, the current training data set with additional training data by applying the dialect classifier to candidate content items; and updating the dialect classifier based on the augmented current training data set. |