Content-aware speaker recognition,申请号US201414264916-传众专利搜索

发明名称	Content-aware speaker recognition
摘要	A content-aware speaker recognition system includes technologies to, among other things, analyze phonetic content of a speech sample, incorporate phonetic content of the speech sample into a speaker model, and use the phonetically-aware speaker model for speaker recognition.
申请公布号	US9336781(B2)	申请公布日期	2016.05.10
申请号	US201414264916	申请日期	2014.04.29
申请人	SRI INTERNATIONAL	发明人	Scheffer Nicolas;Lei Yun
分类号	G10L15/00;G10L15/18;G10L15/20;G10L17/00;G10L17/14	主分类号	G10L15/00
代理机构	Barnes & Thornburg LLP	代理人	Barnes & Thornburg LLP ;McWilliams Thomas J.;Behm, Jr. Edward F.
主权项	1. A text-independent speaker recognition system comprising: a front end module embodied in one or more non-transitory computer readable media and executable by at least one computer device to: process an audio signal comprising a current sample of natural language speech; identify a speech segment in the current sample of natural language speech; and create a phonetic representation of the speech segment of the current speech sample; and a back end module embodied in one or more non-transitory computer readable media and executable by at least one computer device to: create a current speaker model based on the phonetic representation of the speech segment of the current speech sample, the current speaker model mathematically representing at least one speaker-specific phonemic characteristic of the current speech sample; and compare the current speaker model to a stored speaker model, the stored speaker model mathematically associating phonetic content with one or more other speech samples; wherein the front end module is to apply a neural network-based acoustic model to associate the speech segment with phonetic content; wherein the front end module is to align the phonetic content of the speech segment with time; and wherein the front end module is to align the phonetic content of the speech segment in lexical units, and the back end module is to compute a distance between at least one of the lexical units of the phonetic content with a similar lexical unit of the stored speaker model.
地址	Menlo Park CA US