发明名称 Systems and methods for estimating pitch in audio signals based on symmetry characteristics independent of harmonic amplitudes
摘要 Pitch in audio signals may be estimated based on symmetry characteristics independent of harmonic amplitudes. A magnitude spectrum of an audio signal may be provided. The magnitude spectrum may be partitioned by dividing a frequency axis into equal-sized cells. Individual cells may be centered on corresponding harmonic frequencies of a hypothesized pitch. The magnitude spectrum contained in individual cells may be normalized to have equal mean magnitudes and equal standard deviations. A likelihood that the hypothesized pitch is an actual pitch of the audio signal may be determined based on symmetries of magnitude spectra contained in individual cells.
申请公布号 US9396740(B1) 申请公布日期 2016.07.19
申请号 US201414502844 申请日期 2014.09.30
申请人 KnuEdge Incorporated 发明人 Bradley David C.
分类号 G10L11/04;G10L25/90;G10L25/12;G10L25/15;G10L21/0264;G10L25/00 主分类号 G10L11/04
代理机构 Edell, Shapiro & Finnan, LLC 代理人 Edell, Shapiro & Finnan, LLC
主权项 1. A processor-implemented method for estimating pitch in audio signals based on symmetry characteristics independent of harmonic amplitudes, the method being performed by one or more processors configured to execute computer program instructions, the method comprising: providing a magnitude spectrum of an audio signal; partitioning the magnitude spectrum by dividing a frequency axis into equal-sized cells, each cell having a width of a hypothesized pitch and being centered on corresponding harmonic frequencies of the hypothesized pitch; normalizing the magnitude spectrum contained in individual cells to have equal mean magnitudes and equal standard deviations; determining a likelihood that the hypothesized pitch is an actual pitch of the audio signal based on symmetries of magnitude spectra contained in individual cells, wherein the symmetries of magnitude spectra are determined based on whether the magnitude spectrum within an individual cell is symmetric about a corresponding center frequency; repeating the partitioning, normalizing and determining operations for a plurality of hypothesized pitches in addition to the hypothesized pitch; sampling determined likelihoods for the hypothesized pitch and the plurality of hypothesized pitches to generate a pitch likelihood distribution across the hypothesized pitch and the plurality of hypothesized pitches; determining an estimated pitch based on a maximum of the sampling; determining a harmonic amplitude of a voice in the audio signal based on the estimated pitch; and performing speech or speaker recognition using the determined harmonic amplitude of the voice.
地址 San Diego CA US