发明名称 Content-aware speaker recognition
摘要 A content-aware speaker recognition system includes technologies to, among other things, analyze phonetic content of a speech sample, incorporate phonetic content of the speech sample into a speaker model, and use the phonetically-aware speaker model for speaker recognition.
申请公布号 US9336781(B2) 申请公布日期 2016.05.10
申请号 US201414264916 申请日期 2014.04.29
申请人 SRI INTERNATIONAL 发明人 Scheffer Nicolas;Lei Yun
分类号 G10L15/00;G10L15/18;G10L15/20;G10L17/00;G10L17/14 主分类号 G10L15/00
代理机构 Barnes & Thornburg LLP 代理人 Barnes & Thornburg LLP ;McWilliams Thomas J.;Behm, Jr. Edward F.
主权项 1. A text-independent speaker recognition system comprising: a front end module embodied in one or more non-transitory computer readable media and executable by at least one computer device to: process an audio signal comprising a current sample of natural language speech; identify a speech segment in the current sample of natural language speech; and create a phonetic representation of the speech segment of the current speech sample; and a back end module embodied in one or more non-transitory computer readable media and executable by at least one computer device to: create a current speaker model based on the phonetic representation of the speech segment of the current speech sample, the current speaker model mathematically representing at least one speaker-specific phonemic characteristic of the current speech sample; and compare the current speaker model to a stored speaker model, the stored speaker model mathematically associating phonetic content with one or more other speech samples; wherein the front end module is to apply a neural network-based acoustic model to associate the speech segment with phonetic content; wherein the front end module is to align the phonetic content of the speech segment with time; and wherein the front end module is to align the phonetic content of the speech segment in lexical units, and the back end module is to compute a distance between at least one of the lexical units of the phonetic content with a similar lexical unit of the stored speaker model.
地址 Menlo Park CA US