发明名称 Systems and methods for manipulating electronic content based on speech recognition
摘要 Systems and methods are disclosed for displaying electronic multimedia content to a user. One computer-implemented method for manipulating electronic multimedia content includes generating, using a processor, a speech model and at least one speaker model of an individual speaker. The method further includes receiving electronic media content over a network; extracting an audio track from the electronic media content; and detecting speech segments within the electronic media content based on the speech model. The method further includes detecting a speaker segment within the electronic media content and calculating a probability of the detected speaker segment involving the individual speaker based on the at least one speaker model.
申请公布号 US9311395(B2) 申请公布日期 2016.04.12
申请号 US201113156780 申请日期 2011.06.09
申请人 AOL Inc. 发明人 Kocks Peter F.;Hu Guoning;Wu Ping-Hao
分类号 G06F17/30;G10L17/00 主分类号 G06F17/30
代理机构 Finnegan, Henderson, Farabow, Garrett & Dunner, LLP 代理人 Finnegan, Henderson, Farabow, Garrett & Dunner, LLP
主权项 1. A computer-implemented method for manipulating electronic multimedia content, the method comprising: generating, using a processor, a speech model, a non-speech model, at least one speaker model of an individual speaker, and a non-speaker speech model; receiving electronic media content over a network; extracting an audio track from the electronic media content; detecting speech segments within the extracted audio track based on the speech model and the non-speech model, the speech segments containing speech from at least one of a plurality of speakers; detecting a speaker segment within the detected speech segments based on the speaker model and the non-speaker speech model, the speaker segment containing speech from the individual speaker; calculating a first probability of the detected speaker segment involving the individual speaker based on the at least one speaker speech model and the non-speaker speech model; determining a ranking or filtration of the electronic media content relative to other electronic media content based on the first probability of the detected speaker segment; detecting a face within a part of the electronic media content corresponding to the detected speaker segment and calculating a second probability of the detected face being a face of the individual speaker; and adjusting the ranking or filtration of the electronic media content based on the second probability.
地址 Dulles VA US