主权项 |
1. A method comprising:
processing data the estimation of mixing parameters of at least one spot audio signal captured by a sound recording device, so-called spot microphone, arranged in the vicinity of a source among a plurality of acoustic sources constituting a sound scene, and a primary audio signal captured by a sound recording device, arranged to capture said plurality of acoustic sources of the sound scene, said primary audio signal being encoded in a format called “ambisonic”, comprising at least one omnidirectional component (W) and three bidirectional components (X, Y, Z) projected along orthogonal axes of a referential of the primary microphone, wherein said processing comprises the following acts, implemented for a frame of the primary audio signal and a frame of said spot signal, a frame comprising at least one block of N samples: estimating (E2) a delay between the omnidirectional component of the frame of the primary audio signal and the frame of said spot signal, from at least one block of N samples of one of the two frames, so-called block of reference (BRefI), associated with predetermined moment of acquisition (TI), and an observation area (ZObsi) of the other frame, so-called observation area, including at least one block of N samples and formed in proximity of the moment of acquisition, by maximizing a measurement of similarity between the block of reference and a block of the observation area, so-called block of observation (BObsi), temporally offset by the delay (τ) in relation to the block of reference; and estimating (E3) at least one angular position of the source captured by said spot microphone in a referential of the primary microphone by calculation of ratio between a first scalar material of a block of the audio spot signal associated with the predetermined moment of acquisition and a first component of the block of the primary audio signal temporally offset by the estimated delay (τ) and a second scalar material of the same block of audio spot signal and the block of a second component of the primary audio signal temporally offset by the estimated delay (τ). |