发明名称 INDEXING BASED ON TIME-VARIANT TRANSFORMS OF AN AUDIO SIGNAL'S SPECTROGRAM
摘要 An audio identification system generates audio fingerprints and indexes associated with the audio fingerprints based on discrete and overlapping frames within a sample of an audio signal. The system applies a time-to-frequency domain transform to a time-sequence of frames, which may be filtered. The audio identification system then applies a time-variant transformation (e.g., a Discrete Cosine Transform) to the transformed frames and generates an audio fingerprint and index by selecting sets of coefficients of the time-variant transformation. The system selects coefficients that are less sensitive to possible noise and/or distortions in the underlying signal, such as low-frequency coefficients. The time-variant transformation provides sufficient sampling among the indexes by incorporating the phase information of the frames into the indexes. The system stores the audio fingerprint and other identifying information by index for efficient retrieval and matching of the retrieved fingerprints.
申请公布号 US2016148620(A1) 申请公布日期 2016.05.26
申请号 US201514704372 申请日期 2015.05.05
申请人 Facebook, Inc. 发明人 Bilobrov Sergiy
分类号 G10L19/018 主分类号 G10L19/018
代理机构 代理人
主权项 1. A computer-implemented method comprising: obtaining a sample of an audio signal; determining a plurality of frames within the sample, each frame representing a time-interval of the sample and overlapping with one or more adjacent frames of the plurality of frames; determining a frequency spectrum for each frame of the plurality of frames by applying a time domain to frequency domain transformation to each frame of the plurality of frames; generating a time-sequence of frequency spectrums from the frequency spectrums for each frame, the time-sequence of frequency spectrums comprising a two-dimensional array of the frequency spectrums over time; determining a plurality of frequency components by applying a time-variant transformation to the time-sequence of frequency spectrums; and generating an audio fingerprint and an index associated with the audio fingerprint, the audio fingerprint and index each comprising a set of the determined frequency components for the window, and the index comprising fewer frequency components than the audio fingerprint with the fewer frequency components being less sensitive to noise or distortions of the sample.
地址 Menlo Park CA US