摘要 |
The invention involves capturing audio-visual (AV) content, 503, and detecting ambient sounds, 504, associated with the AV content. The ambient sound is checked for sound features (such as speech, music, rhythm, loudness) and correspondence between the sound features and keywords are searched, 507. The AV content is also searched for visual patterns, 505, and a check is made for a correspondence between these visual patterns and the keywords associated with the ambient sound features. Movement information of the apparatus capturing the AV content is also gathered, 506, using, for example, a compass, visual direction finder or a magnetometer. A salient highlight of the AV content is then determined, 509, based on the at least one of contextual data relating the content capturing situation, movement information and the keywords corresponding to features found from ambient sound and the visual patterns. When the salient highlight is found, it is labeled, 510, based on the keywords connected to the recognized highlight event. When a salient highlight is identified the camera can focus or zoom in on a particular point of interest based on the nature of the identified salient highlight. |