发明名称 Video content-based retrieval
摘要 A method and system for video-content based retrieval is described. A query video depicting an activity is processed using interest point selection to find locations in the video that are relevant to that activity. A set of spatio-temporal descriptors such as self-similarity and 3-D SIFT are calculated within a local neighborhood of the set of interest points. An indexed video database containing videos similar to the query video is searched using the set of descriptors to obtain a set of candidate videos. The videos in the video database are indexed hierarchically using a vocabulary tree or other hierarchical indexing mechanism.
申请公布号 US9361523(B1) 申请公布日期 2016.06.07
申请号 US201012841078 申请日期 2010.07.21
申请人 HRL Laboratories, LLC 发明人 Chen Yang;Medasani Swarup;Jiang Qin;Allen David L.;Lu Tsai-Ching
分类号 G06K9/03;G06K9/00 主分类号 G06K9/03
代理机构 Tope-McKay & Associates 代理人 Tope-McKay & Associates
主权项 1. A data processing system for content-based video retrieval, comprising one or more processors configured to perform operations of: receiving a query video clip comprising a sequence of video frames, where the sequence of video frames depicts an activity; performing an interest point selection on the query video to obtain a set of interest points describing locations in the video frames that are relevant to the activity; calculating a set of spatio-temporal descriptors within a local neighborhood of the set of interest points; searching an indexed video database containing video clips of known activities using the set of spatio-temporal descriptors as calculated from the query video clip to obtain a set of candidate videos which contain activities similar to the activity in the query video, whereby the activity in the query video can be identified as a known activity in the candidate videos; wherein the interest point selection comprises an operation of selecting points which have a high motion content, where the motion content is measured by a degree of difference between pixel values in a pair of consecutive image frames, and where high motion content exists when the measured motion content exceeds a predetermined threshold; where the set of spatio-temporal descriptors are of a type selected from the group consisting of a self-similarity descriptor, and a shift-invariant feature transform descriptor; where each candidate video is given a similarity score describing a degree of similarity between the candidate video and the query video, and the similarity score is evaluated based on relevance computed using visual word frequencies; further configured to perform an operation of indexing a video database containing videos of known activities using a hierarchical indexing mechanism; and where the hierarchical indexing mechanism is a vocabulary tree having leaf nodes, and wherein in indexing the video database, all descriptors for the video clips of known activities are computed, with a closest leaf node in the vocabulary tree for each descriptor being found.
地址 Malibu CA US