发明名称 Speaker identification using spatial information
摘要 Embodiments of the present invention relate to speaker identification using spatial information. A method of speaker identification for audio content being of a format based on multiple channels is disclosed. The method comprises extracting, from a first audio clip in the format, a plurality of spatial acoustic features across the multiple channels and location information, the first audio clip containing voices from a speaker, and constructing a first model for the speaker based on the spatial acoustic features and the location information, the first model indicating a characteristic of the voices from the speaker. The method further comprises identifying whether the audio content contains voices from the speaker based on the first model. Corresponding system and computer program product are also disclosed.
申请公布号 US9626970(B2) 申请公布日期 2017.04.18
申请号 US201514971401 申请日期 2015.12.16
申请人 Dolby Laboratories Licensing Corporation 发明人 Huang Shen;Sun Xuejing
分类号 G10L17/00;G10L15/30;G10L25/24;G10L25/78 主分类号 G10L17/00
代理机构 代理人
主权项 1. A method of speaker identification for audio content, the audio content being of a format based on multiple channels, the method comprising: extracting, from a first audio clip in the format, a plurality of spatial acoustic features across the multiple channels and location information, the first audio clip including a plurality of frames for each of a plurality of channels, the first audio clip including audio content corresponding to voices from a speaker, the spatial acoustic features including acoustic characteristics of the voices from the speaker; constructing a first model for the speaker based on the spatial acoustic features and the location information, the first model indicating a characteristic of the voices from the speaker; andidentifying whether the audio content contains voices from the speaker based on the first model, wherein the spatial acoustic features include an intra-channel shifted delta cepstrum (SDC) feature and an inter-channel SDC feature, and wherein extracting the spatial acoustic features from the first audio clip comprises: for each of the multiple channels, extracting a cepstrum coefficient for each frame of the first audio clip in a frequency domain; determining an intra-channel SDC feature for each of the multiple channels based on difference between the cepstrum coefficients for the channel over a predetermined number of frames; and determining an inter-channel SDC feature for each two of the multiple channels based on difference between the cepstrum coefficients for the two channels.
地址 San Francisco CA US
您可能感兴趣的专利