摘要 |
A single-channel audio signal representing a multi-party conversation is generated by marking different audio signals each representing a different participant when they are speaking by finding the current energy in the participants audio signal, generating a speaker-dependent signal (eg. a speaker ID watermark) having a proportional energy, adding the speaker-dependent signal to the participants audio signal, and summing these marked audio signals with unmarked signals into a single-channel audio signal. Current speakers may subsequently be identified and diarized via correlation of speaker ID signals without need for voiceprint training. |