发明名称 Crowd sourcing audio transcription via re-speaking
摘要 Speech audio that is intended for transcription into textual form is received. The received speech audio is divided into first speech segments. A plurality of speakers is identified. A speaker is configured for repeating in spoken form a first speech segment that the speaker has listened to. A subset of speakers is determined for sending each first speech segment. Each first speech segment is sent to the subset of speakers determined for the particular first speech segment. The second speech segments are received from the speakers. The second speech segment is a re-spoken version of a first speech segment that has been generated by a speaker by repeating in spoken form the first speech segment. The second speech segments are processed to generate partial transcripts. The partial transcripts are combined to generate a complete transcript that is a textual representation corresponding to the received speech audio.
申请公布号 US9418660(B2) 申请公布日期 2016.08.16
申请号 US201414156032 申请日期 2014.01.15
申请人 Cisco Technology, Inc. 发明人 Paulik Matthias;Halder Vivek;Sankar Ananth
分类号 G10L15/26;G06Q10/06;G10L15/04;G10L25/87;G10L15/32;G10L15/07 主分类号 G10L15/26
代理机构 Parker Ibrahim & Berg LLC 代理人 Parker Ibrahim & Berg LLC ;Behmke James M.;LeBarron Stephen D.
主权项 1. A method comprising: receiving a speech audio intended for transcription to textual form at a job mapper on a data processing apparatus; dividing, by the job mapper, the received speech audio into first speech segments; identifying, by the job mapper, speakers for sending each first speech segment of the first speech segments; sending, by the job mapper, each first speech segment to the speakers determined for a particular first speech segment; receiving, at the job mapper, second speech segments from the speakers, wherein each second speech segment of the second speech segments is a re-spoken version of a first speech segment of the first speech segments that has been generated by one of the speakers by repeating in spoken form the first speech segment that the one of the speakers has listened to; and processing, by the job mapper, the second speech segments to generate a complete transcript that is a textual representation corresponding to the received speech audio.
地址 San Jose CA US