摘要 |
An apparatus and method for analyzing multimedia content to identify the presence of audio, visual and textual cues that together correspond to one or more high-level semantics are provided. The apparatus and method make use of one or more analysis models that are trained to analyze audio, visual and textual portions of multimedia content to generate scores associated with the audio, visual and textual portions with respect to various high-level semantic concepts. These scores are used to generate a vector of scores. The apparatus is trained with regard to relationships between audio, visual and textual scores to thereby take the vector of scores generated for the multimedia content and classify the multimedia content into one or more high-level semantic concepts. Based on the scores for the various audio, video and textual portions of the multimedia content, a level of certainty regarding the high-level semantic concepts may be generated. These high-level semantic concepts are then used to generate one or more labels for the multimedia content that may be used to retrieve the multimedia content using a conceptual search engine. These semantic concept labels and their associated certainty levels may be stored in a file, associated with the multimedia content, for use in retrieving the multimedia content using the conceptual search engine.
|