发明名称 AUDIO MATCHING BASED ON HARMONOGRAM
摘要 In an example context of identifying live audio, an audio processor machine accesses audio data that represents a query sound and creates a spectrogram from the audio data. Each segment of the spectrogram represents a different time slice in the query sound. For each time slice, the audio processor machine determines one or more dominant frequencies and an aggregate energy value that represents a combination of all the energy for that dominant frequency and its harmonics. The machine creates a harmonogram by representing these aggregate energy values at these dominant frequencies in each time slice. The harmonogram thus may represent the strongest harmonic components within the query sound. The machine can identify the query sound by comparing its harmonogram to other harmonograms of other sounds and may respond to a user's submission of the query sound by providing an identifier of the query sound to the user.
申请公布号 US2016196343(A1) 申请公布日期 2016.07.07
申请号 US201514980622 申请日期 2015.12.28
申请人 Gracenote, Inc. 发明人 Rafii Zafar
分类号 G06F17/30;G10L25/45;G10L25/72;G10L25/21;G10L25/18;G10L25/54 主分类号 G06F17/30
代理机构 代理人
主权项 1. A method comprising: accessing, using one or more processors, audio data that represents query sound to be identified; creating, using the one or more processors, a spectrogram of the audio data, different segments of the spectrogram representing amplitudes at frequencies in different time slices of the query sound; determining, using the one or more processors, a dominant frequency in a time slice of the query sound based on a segment of the spectrogram, the determining including: calculating an aggregate energy value of a candidate frequency based on amplitudes of the candidate frequency and harmonics thereof represented in the segment of the spectrogram; andidentifying the candidate frequency as the dominant frequency based on the aggregate energy value of the candidate frequency being a largest aggregate energy value among aggregate energy values of frequencies whose amplitudes are represented in the segment of the spectrogram; creating, using the one or more processors, a query harmonogram of the audio data, different segments of the query harmonogram representing aggregate energy values of dominant frequencies in different time slices of the query sound; and providing, using the one or more processors, an identifier of the query sound based on a comparison of the query harmonogram to a reference harmonogram mapped to the identifier by a database.
地址 Emeryville CA US