主权项 |
1. A computer-implemented method performed by a processor, the method comprising:
identifying, in a database of utterances, transcribed utterances and un-transcribed utterances; ordering, via the processor, transcription candidate utterances from the un-transcribed utterances based on confidence scores of the transcription candidate utterances, to yield a selectively sampled order; transcribing, via the processor, a top n utterances from the selectively sampled order, to yield additional transcribed utterances; receiving human-transcribed utterances, wherein the human-transcribed utterances are selected from the selectively sampled order for human transcription based on the confidence scores; adding the additional transcribed utterances and the human-transcribed utterances to the database of utterances; and training acoustic and language models using the additional transcribed utterances and the human-transcribed utterances. |