摘要 |
PROBLEM TO BE SOLVED: To provide an apparatus, method and program for processing video data with sound in which voices of a photographer can be effectively utilized in reproduction. SOLUTION: A voice signal analyzing section 52 converts human voices that can be converted into characters, from photographer voice data 66 read from a photographer voice signal recording section 50, into characters through voice recognition processing and outputs them as speech content information. Furthermore, the voice signal analyzing section 52 acquires information of a speech time during which the voices converted into characters are generated. The speech time information is an information (frame) number specifying frames of video data (motion pictures) when starting and completing a speech, speech start time and end time, and the like. A meta-data generating section 54 stores the speech time information, the speech content information and the like in meta-data of a predetermined file format (e.g., xml format). These meta-data are associated with photographer voice data 66 and recorded in a photographer voice signal recording section 50. COPYRIGHT: (C)2007,JPO&INPIT
|