发明名称 |
NON-SPEECH SECTION DETECTING METHOD AND NON-SPEECH SECTION DETECTING DEVICE |
摘要 |
<p>A frame generating section (20) of control means (2) generates a frame having a predetermined time length from sound data. A spectrum bias/power/pitch deriving unit (21a) derives at least one of the bias of the spectrum obtained by converting the sound data into a component on the frequency axis, the power of the sound data, and the pitch of the sound data. A variation amount deriving unit (21b) derives the amount of variation of the value derived by the spectrum bias/power/pitch deriving unit (21a) from that of the previous frame. As the bias of the spectrum, the ratio of the first-order autocorrelation function of the sound data to the zero-order autocorrelation function there of is used. If the amount of variation is judged to be a predetermined threshold or less, a non-speech section detecting unit (22b) detects a non-speech section including consecutive frames when the amount of variation is judged to be a predetermined threshold or less and when the number of consecutive frames is a predetermined one or more. The section where the amount of variation is large singly is excluded from the non-speech section. If the section where the amount of variation is large singly is sandwiched between two non-speech sections, the section is detected as a non-speech section irrespective of the judgment.</p> |
申请公布号 |
WO2009078093(A1) |
申请公布日期 |
2009.06.25 |
申请号 |
WO2007JP74274 |
申请日期 |
2007.12.18 |
申请人 |
FUJITSU LIMITED;WASHIO, NOBUYUKI;HAYAKAWA, SHOJI |
发明人 |
WASHIO, NOBUYUKI;HAYAKAWA, SHOJI |
分类号 |
G10L25/78 |
主分类号 |
G10L25/78 |
代理机构 |
|
代理人 |
|
主权项 |
|
地址 |
|