摘要 |
PROBLEM TO BE SOLVED: To extract a video structure in a high level as for various videos. SOLUTION: A video sound processing device 10 is provided with a scene detection part 16. In this part 16, feature quantities extracted from video segments and/or audio segments divided from a stream of inputted video data and a measurement reference which is calculated for each feature quantity by using these feature quantities and measures the similarity between video segments and/or audio segments are used to detect two video segments and/or audio segments, between which the time difference is equal to or shorter than a prescribed time threshold and the non-similarity is equal to or less than a prescribed non-similarity threshold, out of video segments and/or audio segments, and they are integrated into a scene consisting of video segments and/or audio segments which reflect a semantic structure of contents of video data and are continuous with respect to time.
|