发明名称 Voice data playback speed conversion method and voice data playback speed conversion device
摘要 The present invention addresses the problems of enabling a process of converting voice data playback speed even in a voice data playback device alone. The solution is a voice data playback speed conversion method and a voice data playback speed conversion device, comprising: a step of setting a reference zero cross point from any arbitrary zero cross point; a step of selecting a zero cross point temporally after the reference zero cross point within a first predetermined time range; a step of calculating a reference correlation function in a waveform from the reference zero cross point until a second predetermined time; and a step of calculating a correlation function in a waveform from a plurality of previously selected zero cross points until the second predetermined time, wherein a second reference zero cross point is the zero cross point of the waveform having a correlation function in which a concordance rate of the correlation value between the reference correlation function and the correlation function is the highest value, the difference between the reference zero cross point and the second reference zero cross point is calculated as a basic cycle, and the expansion and contraction of voice data is executed in basic cycle units so as to perform a process of converting the playback speed of the voice data.
申请公布号 US9361905(B2) 申请公布日期 2016.06.07
申请号 US201414763303 申请日期 2014.01.21
申请人 SHINANO KENSHI KABUSHIKI KAISHA 发明人 Tsunoda Shoji;Nishizawa Tatsuo
分类号 G10L21/00;G10L21/047;G10L21/049;G10L25/09;G10L25/27;G10L19/00 主分类号 G10L21/00
代理机构 Stites & Harbison, PLLC. 代理人 Weyer, Esq. Stephen J.;Stites & Harbison, PLLC.
主权项 1. A voice data playback speed conversion method for converting voice data playback speed, comprising: a step of removing DC components, wherein DC components of original voice data being a playback object are removed; a step of extracting basic voice signals constituted by a basic frequency of the voice data, from which DC components have been removed, by setting a cutoff frequency at an intermediate value of the basic frequency and low-pass filtering so as to extract the basic frequency; a step of extracting rising zero cross points of the basic voice signals; a step of setting a reference zero cross point, which is an arbitrary reference zero cross point selected from the rising zero cross points; a step of selecting a plurality of the rising zero cross points temporally after the reference zero cross point within a first predetermined time range; a step of selecting a reference waveform temporally after the reference zero cross point until a second predetermined time; a step of selecting comparison object waveforms from each of the zero cross points, which has been selected in said step of selecting the rising zero cross points, until the second predetermined time; a step of calculating an autocorrelation value between the reference waveform and the reference waveform by using a correlation function; a step of calculating correlation values between the reference waveform and the comparison object waveforms by using a correlation function; a step of calculating voice blocks each of which is segmented by a start point of the voice data and an end point thereof, wherein the autocorrelation value is compared with the correlation values, the zero cross point of the comparison object waveform which is used for calculating the correlation value whose concordance rate with respect to the autocorrelation value is highest is defined as a second reference zero cross point, the start point of the voice data corresponds to the reference zero cross point, and the end point of the voice data corresponds to the second reference zero cross; and a step of expanding and contracting the voice data in basic cycle units so as to convert the playback speed of the voice data.
地址 Ueda-Shi, Nagano JP