主权项 |
1. A data processing system for content-based video retrieval, comprising one or more processors configured to perform operations of:
receiving a query video clip comprising a sequence of video frames, where the sequence of video frames depicts an activity; performing an interest point selection on the query video to obtain a set of interest points describing locations in the video frames that are relevant to the activity; calculating a set of spatio-temporal descriptors within a local neighborhood of the set of interest points; searching an indexed video database containing video clips of known activities using the set of spatio-temporal descriptors as calculated from the query video clip to obtain a set of candidate videos which contain activities similar to the activity in the query video, whereby the activity in the query video can be identified as a known activity in the candidate videos; wherein the interest point selection comprises an operation of selecting points which have a high motion content, where the motion content is measured by a degree of difference between pixel values in a pair of consecutive image frames, and where high motion content exists when the measured motion content exceeds a predetermined threshold; where the set of spatio-temporal descriptors are of a type selected from the group consisting of a self-similarity descriptor, and a shift-invariant feature transform descriptor; where each candidate video is given a similarity score describing a degree of similarity between the candidate video and the query video, and the similarity score is evaluated based on relevance computed using visual word frequencies; further configured to perform an operation of indexing a video database containing videos of known activities using a hierarchical indexing mechanism; and where the hierarchical indexing mechanism is a vocabulary tree having leaf nodes, and wherein in indexing the video database, all descriptors for the video clips of known activities are computed, with a closest leaf node in the vocabulary tree for each descriptor being found. |