发明名称 Video concept classification using audio-visual grouplets
摘要 A method for determining a semantic concept classification for a digital video clip, comprising: receiving an audio-visual dictionary including a plurality of audio-visual grouplets, the audio-visual grouplets including visual background and foreground codewords, audio background and foreground codewords, wherein the codewords in a particular audio-visual grouplet were determined to be correlated with each other; analyzing the digital video clip to determine a set of visual features and a set of audio features; determining similarity scores between the digital video clip and each of the audio-visual grouplets by comparing the set of visual features to any visual background and foreground codewords associated with a particular audio-visual grouplet, and comparing the set of audio features to any audio background and foreground codewords associated with the particular audio-visual grouplet; and determining one or more semantic concept classifications using trained semantic classifiers.
申请公布号 US8867891(B2) 申请公布日期 2014.10.21
申请号 US201113269742 申请日期 2011.10.10
申请人 Intellectual Ventures Fund 83 LLC 发明人 Jiang Wei;Loui Alexander C.
分类号 H04N5/92;H04N21/233;H04N21/234;H04N5/93;G11B27/28 主分类号 H04N5/92
代理机构 代理人
主权项 1. A method for determining a semantic concept classification for a digital video clip including a temporal sequence of video frames and a corresponding audio soundtrack, the method comprising: analyzing, by a processing device, the temporal sequence of video frames to determine a set of visual features; analyzing, by the processing device, the audio soundtrack to determine a set of audio features; determining, by the processing device, similarity scores between the digital video clip and each of a plurality of audio-visual grouplets from an audio-visual dictionary, wherein the plurality of audio-visual grouplets includes distinct visual background codewords representing visual background content, distinct visual foreground codewords representing visual foreground content, distinct audio background codewords representing audio background content, and distinct audio foreground codewords representing audio foreground content, wherein the distinct visual background codewords and the distinct visual foreground codewords are separate and distinct from each other, wherein the distinct audio background codewords and the distinct audio foreground codewords are separate and distinct from each other, and wherein the distinct visual background codewords and the distinct visual foreground codewords are separate and distinct from the distinct audio background codewords and the distinct audio foreground codewords, and wherein the determining similarity scores comprises: comparing the set of visual features to distinct visual background codewords and distinct visual foreground codewords associated with a particular audio-visual grouplet; and comparing the set of audio features to distinct audio background codewords and distinct audio foreground codewords associated with the particular audio-visual grouplet; determining, by the processing device, one or more semantic concept classifications using trained semantic classifiers responsive to the determined similarity scores; and storing, by the processing device, indications of the one or more semantic concept classifications in a processor-accessible memory.
地址 Las Vegas NV US