摘要 |
A time-domain method of adaptively levelling the loudness of a digital audio signal is proposed. It selects a proper frequency weighting curve to relate the volume level to the human auditory system. The audio signal is segmented into frames of a suitable duration for content analysis. Each frame is classified to one of several predefined states and events of perceptual interest is detected. Four quantities are updated each frame according to the classified state and detected event to keep track of the signal. One quantity measures the long-term loudness and is the main criterion for state classification of a frame. The second quantity is the short-term loudness that is mainly used for deriving the target gain. The third quantity measures the low-level loudness when the signal is deemed to not contain important content, giving a reasonable estimate of noise floor. A fourth quantity measures the peak loudness level that is used to simulate the temporal masking effect. The target gain to maintain the audio signal to the desired loudness level is calculated by a volume leveller, regulated by a gain controller that simulates the temporal masking effect to get rid of unnecessary gain fluctuations, ensuring a pleasant sound. |