发明名称 |
Systems and methods for manipulating electronic content based on speech recognition |
摘要 |
Systems and methods are disclosed for displaying electronic multimedia content to a user. One computer-implemented method for manipulating electronic multimedia content includes generating, using a processor, a speech model and at least one speaker model of an individual speaker. The method further includes receiving electronic media content over a network; extracting an audio track from the electronic media content; and detecting speech segments within the electronic media content based on the speech model. The method further includes detecting a speaker segment within the electronic media content and calculating a probability of the detected speaker segment involving the individual speaker based on the at least one speaker model. |
申请公布号 |
US9311395(B2) |
申请公布日期 |
2016.04.12 |
申请号 |
US201113156780 |
申请日期 |
2011.06.09 |
申请人 |
AOL Inc. |
发明人 |
Kocks Peter F.;Hu Guoning;Wu Ping-Hao |
分类号 |
G06F17/30;G10L17/00 |
主分类号 |
G06F17/30 |
代理机构 |
Finnegan, Henderson, Farabow, Garrett & Dunner, LLP |
代理人 |
Finnegan, Henderson, Farabow, Garrett & Dunner, LLP |
主权项 |
1. A computer-implemented method for manipulating electronic multimedia content, the method comprising:
generating, using a processor, a speech model, a non-speech model, at least one speaker model of an individual speaker, and a non-speaker speech model; receiving electronic media content over a network; extracting an audio track from the electronic media content; detecting speech segments within the extracted audio track based on the speech model and the non-speech model, the speech segments containing speech from at least one of a plurality of speakers; detecting a speaker segment within the detected speech segments based on the speaker model and the non-speaker speech model, the speaker segment containing speech from the individual speaker; calculating a first probability of the detected speaker segment involving the individual speaker based on the at least one speaker speech model and the non-speaker speech model; determining a ranking or filtration of the electronic media content relative to other electronic media content based on the first probability of the detected speaker segment; detecting a face within a part of the electronic media content corresponding to the detected speaker segment and calculating a second probability of the detected face being a face of the individual speaker; and adjusting the ranking or filtration of the electronic media content based on the second probability. |
地址 |
Dulles VA US |