发明名称 Sound identification systems
摘要 A digital sound identification system for storing a Markov model is disclosed. A processor is coupled to a sound data input, working memory, and a stored program memory for executing processor control code to input sound data for a sound to be identified. The sample sound data defines a sample frequency domain data energy in a range of frequency. Mean and variance values for a Markov model of the sample sound are generated. The Markov model is stored in the non-volatile memory. Interference sound data defining interference frequency domain data is inputted. The mean and variance values of the Markov model using the interference frequency domain data are adjusted. Sound data defining other sound frequency domain data are inputted. A probability of the other sound frequency domain data fitting the Markov model is determined. Finally, sound identification data dependent on the probability is outputted.
申请公布号 US9286911(B2) 申请公布日期 2016.03.15
申请号 US201414533837 申请日期 2014.11.05
申请人 Audio Analytic LTD 发明人 Mitchell Christopher James
分类号 G06F15/18;G10L25/51;G10L15/02;G06F17/18;G10L15/14;G10L25/63;G10L25/57;G08B13/00;G10L15/20 主分类号 G06F15/18
代理机构 Tarolli, Sundheim, Covell & Tummino, LLP 代理人 Tarolli, Sundheim, Covell & Tummino, LLP
主权项 1. A camera system including at least one camera, at least one microphone to capture sound, and a digital sound recognition system to recognise one or more target sounds in said captured sound, wherein said digital sound recognition system comprises: a sound input coupled to said at least one microphone; program memory storing processor control code; working memory; non-volatile memory for storing one or more statistical models λ of one or more target sounds; and a processor coupled to said sound input, to said working memory, and to said program memory for executing said processor control code, wherein said processor control code comprises code to: input sound data for a sound, the sound data defining frequency domain data, said frequency domain data defining an energy of said sound in a plurality of frequency ranges; receive and store in said non-volatile memory pre-derived statistical model parameters representing a set of statistical models λ representing a set of said target sounds, wherein said parameters characterise said set of target sounds; perform a multiple component decomposition of said sound data by applying a succession of windows to said sound data to construct a time frequency representation of said sound data; calculate, for said set of statistical models λ representing said set of said target sounds, the probability of an observed frequency distribution sequence in said multiple component decomposition of said sound data up to a time T, given the model λ:P⁡(O❘λ)=∑i=1N⁢⁢αT⁡(i) where αt(i) represents the probability of observing said time frequency representation of said sound data up to time t, where i indexes a state of said model defined by mean and covariance values, and wherein P(O|λ) characterises how a frequency distribution of said statistical model λ changes over time; apply a set of threshold values to said probability of an observed frequency distribution sequence in said multiple component decomposition to classify said sound data according to said set of statistical models λ; move a view of said camera in response to classification of said sound data according to said set of statistical models λ to direct said camera towards a recognised target sound.
地址 GB