摘要 |
Various embodiments facilitate multimedia synchronization based on video processing and audio processing. In one embodiment, a multimedia synchronization system is provided to synchronize video and audio content by performing video processing on the video content, audio processing on the audio content, and a synchronization process. The video processing and the audio processing generate recognized lip movement and recognized speech, respectively. The synchronization process determines a match between lip movement of the recognized lip movement and speech of the recognized speech, and synchronizes the video content and the audio content based on the match. |
主权项 |
1. A method, comprising:
obtaining, by a host, video content and audio content; performing, by the host, video processing on the video content, the video processing including:
detecting a presence of a face in the video content by performing face detection,detecting the face speaking by performing speaker detection, andrecognizing lip movements of the face speaking by performing lip recognition; performing, by the host, audio processing on the audio content, the audio processing including:
recognizing speech in the audio content by performing speech recognition; performing, by the host, a synchronization process, the synchronization process including:
determining a match between a lip movement of the recognized lip movements and speech of the recognized speech, andsynchronizing the video content and the audio content based on the match; and providing, by the host, the synchronized video content and audio content to a user. |