主权项 |
1. A voice detection method allowing to detect the presence of speech signals in a noisy acoustic signal x(t) coming from a microphone, including the following successive steps:
a preliminary sampling step comprising a cutting of the acoustic signal x(t) into a discrete acoustic signal {xi} composed of a sequence of vectors associated with time frames i of length N, N corresponding to the number of sampling points, where each vector reflects the acoustic content of the associated frame i and is composed of the N samples x(i−1)N+1, x(i−1)N+2, . . . , xiN−1, xiN, i being a positive integer; a step of calculating a detection function FD(τ) based on the calculation of a difference function D(τ) varying in accordance with a shift τ on an integration window of length W starting at the time t0, with:
D(τ)=Σn=t0t0+w−1|x(n)−x(n+τ)| where 0≦τ≦max(τ); wherein this step of calculating a detection function FD(τ) consists in calculating a discrete detection function FDi(τ) associated with the frames i; a step of adapting a threshold in said current interval, in accordance with values calculated from the acoustic signal x(t) established in said current interval, wherein this step of adapting a threshold consists, for each frame i, in adapting a threshold Ωi specific to the frame i depending on reference values calculated from the values of the samples of the discrete acoustic signal {xi} in said frame i; a step of searching for a minimum of the detection function FD(τ) and comparing this minimum with a threshold, for τ varying in a determined interval of time called current interval in order to detect the presence or not of a fundamental frequency F0 characteristic of a speech signal within said current interval, where this step of searching for a minimum of the detection function FD(τ) and comparing this minimum with a threshold is carried out by searching, on each frame i, for a minimum rr(i) of the discrete detection function FDi(τ) and by comparing this minimum rr(i) with a threshold Ωi specific to the frame i; and wherein a step of adapting the thresholds Ωi for each frame i includes the following steps: a) subdividing the frame i comprising N sampling points into T sub-frames of length L, where N is a multiple of T so that the length L=N/T is an integer, and so that the samples of the discrete acoustic signal {xi} in a sub-frame of index j of the frame i comprise the following L samples:
x(i−1)N+(j−1)L+1, x(i−1)N+(j−1)L+2, . . . , x(i−1)N+jL, j being a positive integer comprised between 1 and T; b) calculating maximum values mi,j of the discrete acoustic signal {xi} in each sub-frame of index j of the frame i, with:
mi,j=max{x(i−1)N+(j−1)L+1, x(i−1)N+(j−1)L+2, . . . , x(i−1)N+jL}; c) calculating at least one reference value Refi,j, MRefi,j specific to the sub-frame j of the frame i, the or each reference value Refi,j, MRefi,j per sub-frame j being calculated from the maximum value mi,j in the sub-frame j of the frame i; d) establishing the value of the threshold Ωi specific to the frame i depending on all reference values Refi,j, MRefi,j calculated in the sub-frames j of the frame i. |