摘要 |
Duplicate video search results are detected and removed. Digital signatures are generated for each video content item of a video content corpus. Duplicates are determined for the top n previously received queries by determining the similarity of video content items that are within the same results set of each particular query of the top n previously received queries. Similarities are calculated between any two video documents of the result set of the particular query by measuring the difference between the digital signatures of two video documents. If a similarity between two videos is determined to be above a particular threshold, then the two videos are considered duplicates of each other and the search index is updated by retaining the most relevant of the video documents to the particular query. The less relevant video documents are flagged as duplicates with respect to the particular query. |