摘要 |
A method for language-independent, non-semantic speech analytics that may analyze spoken utterances without regard for the language or speakers, comprising the steps of receiving an audio input containing human speech, analyzing the audio to identify the waveform pattern, and analyzing the waveform to identify periods of silence, and additional methods for alternative non-speech-based speech analysis, and a system for non-speech-based analysis comprising a media server that receives audio input, an analytics server that processes the audio input, and a management server that configures operation of the analytics server. |
主权项 |
1. A method for language-independent, non-semantic speech analytics, comprising the steps:
receiving, at a media server stored and operating on a network-connected analytics server computer, an audio input from a plurality of network-connected devices; analyzing, using the analytics server computer, the audio input to determine an audio waveform; analyzing, using the analytics server computer, the waveform to determine a plurality of periods of silence wherein the plurality of periods of silence are detected by a plurality of valleys in the amplitude of the waveform; analyzing, using the analytics server computer, the waveform to identify a plurality of units of speech wherein the plurality of units of speech are identified by a plurality of peaks in the amplitude of the waveform; analyzing, using the analytics server computer, the units of speech within the waveform to determine speech characteristics, including at least a pace of speech during an interaction and a change in pace of speech during an interaction wherein the change in pace is identified by successive stages of analysis utilizing results of previous stages; analyzing, using the analytics server computer, the waveform to determine a plurality of periods of cross-talk wherein two or more interaction participants are speaking simultaneously wherein a talk ratio is calculated to determine at least a contribution of each of the two or more interaction participants and a quantity of cross talk in the waveform wherein the contribution is computed by determining the relative speaking time of each of the two or more speakers as a fraction of total interaction time; analyzing, using the analytics server, the waveform to determine an emotional state of a speaker wherein the emotional state of the speaker is determined by the quantity of cross talk in the waveform; analyzing, using the analytics server, a speech pattern using at least a pace of speech; identifying an unknown speaker based on the speech pattern wherein identifying the unknown speaker is determined by comparing the speech pattern to a plurality of previously stored speech patterns; storing the results of waveform analysis for future reference in a database stored and operating on a network-attached computer; and sending the results of the waveform analysis to a client computing device for viewing by a user. |