摘要 <p>The present invention is a method of providing video content information using automatic video content recognition. Here, said video content are those entities that are shown in the video, or audio content associated with said video that are audible by a viewer. For example, if a viewer is watching a football match, then the football ,the players, the field, advertisement boards and other entities that a viewer can see in said video or audio elements associated with said video that are audible to said viewer, such as, the commentary or noise of the crowd are all video content information. The method provides for automatic recognition of video content items ,by anayzing the video, frame by frame, and recognizing the content items shown in the video as to what they are in a real world context. The method also involves deriving meanings of the context/abstraction of two or more content items shown in a particular frame, or a sequence of frames, for example, an apple in the hand of a person who is taking a bite off the apple is going to be an abstraction of 'A person eating an apple '. The method also facilitates derivation of advanced abstractions, based on the analysis of frame image or sequence of frame images of a video wherein the method would be able to predict the meaning of above example as an abstraction of ' Chuck Norris eating an apple', by further analysis of the face of the person . Upon said abstraction, the information related to the subject 'Chuck Norris eating an apple' or 'a person eating an apple' (as the case may be) is retrieved from a database,and presented to a a viewer upon the viewer's request , in addition to information about Chuck Norris and apple . Let us consider another example. If in a particular frame or a sequence of frames, there is a person holding a golf putter in his hand, the method would first recognize the two content items (person and golf putter) separately ,then, according to the method, abstraction of said frame or sequence of frames would be u A person playing golf' . The method further comprises more analysis of the frame image or sequence of images. Using various image feature extraction and feature detection and matching algorithms, said person is recognized by comparing said person's face with the human faces in a pre defined database, and if the face matches with the face of 'Tiger Woods', then, abstraction of said frame or sequence of frames would be 'Tiger Woods playing golf'. On similar lines, depending upon the information about the various content items in the database, the method is able to recognize and predict content items shown in a video ,and retrieve relevant quantitative as well as qualitative information about the content items. The method may also involve recognizing content items and meanings based on audio content associated with a video. Let us consider a football match, where the commentary is by John Madden. Now, even if John Madden himself is not featured in the video as a visual content item, the sound of his voice associated with said video is . The method ,in this case, involves recognizing the voice that is audible to a viewer as the voice of 'John Madden '. The method also involves making sense of what is actually being spoken by an entity in the audio clip associated with the video. For example, if John Madden ,at a particular period of time in the video says ' This is the third time Dan Marino has been sacked today. Today is just not this Quarter Back's day'. The method involves using Natural Speech Processor to derive meaning of Natural human speech. The method involves recognizing verbs, adjectives and nouns mentioned in speech, and then converts these nouns, verbs and adjectives into texts and uses a text based search engine to retrieve information pertaining to the nouns/verbs/adjectives. Taking into consideration John Madden's statement in the above example, the Natural Speech Processor would first recognize words like 'Dan', 'Marino', 'Sacked', 'Quarter' and 'Back' and then, the method would provide both qualitative as well as quantitative meaning of the words individually, as well as the words in a sentence/phrase. Consequently, the information that would be made available/presented to the viewer would be information about who Dan Marino is and other information related to him, what the word 'sacked' means in the current context ( football),what a quarter back is, etc.</p>
申请公布号 IN2900MU2013(A) 申请公布日期 2015.07.03
申请号 IN2013MU02900 申请日期 2013.09.06
分类号 H04N5/93;H04N5/92 主分类号 H04N5/93
代理机构 代理人