发明名称 Video content alignment
摘要 Various embodiments identify differences between frame sequences of a video. For example, to determine a difference between two versions of a video, a fingerprint of each frame of the two versions is generated. From the fingerprints, a run-length encoded representation of each version is generated. The fingerprints which appear only once (i.e., unique fingerprints) in the entire video are identified from each version and compared to identify matching unique fingerprints across versions. The matching unique fingerprints are sorted and filtered to determine split points, which are used to align the two versions of the video. Accordingly, each version is segmented into smaller frame sequences using the split points. Once segmented, the individual frames of each segment are aligned across versions using a dynamic programming algorithm. After aligning the segments at a frame level, the segments are reassembled to generate a global alignment output.
申请公布号 US9275682(B1) 申请公布日期 2016.03.01
申请号 US201414498818 申请日期 2014.09.26
申请人 A9.com, Inc. 发明人 Yalniz Ismet Zeki;Carlson Adam;Gray Douglas Ryan;Taylor Colin Jon
分类号 G06K9/34;G11B27/30;G11B27/036 主分类号 G06K9/34
代理机构 代理人
主权项 1. A non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause a computing device to: generate, for a first version of a video file, a first version fingerprint for each frame of the first version; generate, for a second version of the video file, a second version fingerprint for each frame of the second version of the video file, wherein generating a fingerprint from a frame of the first version and the second version of the video file includes: segmenting the frame into a plurality of cells; computing an average intensity value for each cell; and comparing, for at least a subset of the plurality of cells, the average intensity value for a first cell against each other cell of the subset to compute the fingerprint; generate a first run-length encoded representation of first version fingerprints for the first version of the video file; generate a second run-length encoded representation of second version fingerprints for the second version of the video file; identify a first set of unique fingerprints of the first version from the first run-length encoded representation; identify a second set of unique fingerprints of the second version from the second run-length encoded representation; determine a set of split points for the first version and the second version of the video file, wherein determining the split points includes: comparing the first set of unique fingerprints to the second set of unique fingerprints to identify a plurality of matching unique fingerprints; sorting the matching unique fingerprints by time; and filtering the matching unique fingerprints using a Longest Common Subsequences (LCS) algorithm to determine the set of split points; segment the first version of the video file into a plurality of first version segments using the set of split points; segment the second version of the video file into a plurality of second version using the set of split points; determine, using Gotoh's sequence alignment algorithm, individual frame correspondences between each first version segment of the plurality of first version segments and a corresponding second version segment of the plurality of second version segments; and concatenate the plurality of first version segments and the plurality of second version segments based at least in part on the individual frame correspondences to generate a global alignment comparison output.
地址 Palo Alto CA US