发明名称 |
Content-aware speaker recognition |
摘要 |
A content-aware speaker recognition system includes technologies to, among other things, analyze phonetic content of a speech sample, incorporate phonetic content of the speech sample into a speaker model, and use the phonetically-aware speaker model for speaker recognition. |
申请公布号 |
US9336781(B2) |
申请公布日期 |
2016.05.10 |
申请号 |
US201414264916 |
申请日期 |
2014.04.29 |
申请人 |
SRI INTERNATIONAL |
发明人 |
Scheffer Nicolas;Lei Yun |
分类号 |
G10L15/00;G10L15/18;G10L15/20;G10L17/00;G10L17/14 |
主分类号 |
G10L15/00 |
代理机构 |
Barnes & Thornburg LLP |
代理人 |
Barnes & Thornburg LLP ;McWilliams Thomas J.;Behm, Jr. Edward F. |
主权项 |
1. A text-independent speaker recognition system comprising:
a front end module embodied in one or more non-transitory computer readable media and executable by at least one computer device to: process an audio signal comprising a current sample of natural language speech; identify a speech segment in the current sample of natural language speech; and create a phonetic representation of the speech segment of the current speech sample; and a back end module embodied in one or more non-transitory computer readable media and executable by at least one computer device to: create a current speaker model based on the phonetic representation of the speech segment of the current speech sample, the current speaker model mathematically representing at least one speaker-specific phonemic characteristic of the current speech sample; and compare the current speaker model to a stored speaker model, the stored speaker model mathematically associating phonetic content with one or more other speech samples; wherein the front end module is to apply a neural network-based acoustic model to associate the speech segment with phonetic content; wherein the front end module is to align the phonetic content of the speech segment with time; and wherein the front end module is to align the phonetic content of the speech segment in lexical units, and the back end module is to compute a distance between at least one of the lexical units of the phonetic content with a similar lexical unit of the stored speaker model. |
地址 |
Menlo Park CA US |