发明名称 SPEAKER VERIFICATION METHODS AND APPARATUS
摘要 Techniques for automatically identifying a speaker in a conversation as a known person based on processing of audio of the speaker's voice to extract characteristics of that voice and on an automated comparison of those characteristics to known characteristics of the known person's voice. A speaker segmentation process may be performed on audio of the conversation to produce, for each speaker in the conversation, a segment that includes the audio of that speaker. Audio of each of the segments may then be processed to extract characteristics of that speaker's voice. The characteristics derived from each segment (and thus for multiple speakers) may then be compared to characteristics of the known person's voice to determine whether the speaker for that segment is the known person. For each segment, a degree of match between the voice characteristics of the speaker and the voice characteristics of the known person may be calculated.
申请公布号 US2017061968(A1) 申请公布日期 2017.03.02
申请号 US201514838010 申请日期 2015.08.27
申请人 Nuance Communications, Inc. 发明人 Dalmasso Emanuele;Colibro Daniele;Vair Claudio;Farrell Kevin R.
分类号 G10L17/06 主分类号 G10L17/06
代理机构 代理人
主权项 1. A method of evaluating whether a first speaker in a conversation is a user whose identity has been asserted by analyzing audio of the conversation, wherein the conversation involves a second speaker whose identity is known, and wherein at least a portion of the audio of the conversation has been decomposed into a first segment and a second segment, each of the first segment and the second segment being composed substantially of audio of a single speaker speaking in the conversation, the method comprising: comparing the first segment to a first voiceprint of the user to determine a first likelihood that the first segment corresponds to the user; comparing the first segment to a second voiceprint of the second speaker to determine a second likelihood that the first segment corresponds to the second speaker; comparing the second segment to the first voiceprint of the user to determine a third likelihood that the second segment corresponds to the user; comparing the second segment to the second voiceprint of the second speaker to determine a fourth likelihood that the second segment corresponds to the second speaker; and determining whether the first speaker is the user based, at least in part, on the first, second, third and fourth likelihoods.
地址 Burlington MA US