摘要 |
An object of the present invention is to provide the capability that image segments constituting an input video is made to correspond to sound segments constituting the input video, if the image segments and the sound segments include an identical object. Provided are an image segment classification means 101 that analyzes an input video to generate a plurality of image segment groups, each of the image segment groups including a plurality of image segments which include an identical object; a sound segment classification means 102 that analyzes the input video to generate a plurality of sound segment groups, each of the sound segment groups including a plurality of sound segments which include an identical object; an inter-segment group score calculation means 103 that calculates a similarity score between each image segment group and each sound segment group based on the time during which the image segment group and the sound segment group are present at the same time or only one of them is present; and a segment group correspondence decision means 104 that decides, using the scores, whether or not an object in the image segment groups and an object in the sound segment groups are the same (FIG. 1).
|