发明名称 Method and apparatus for speech segmentation
摘要 Machine-readable media, methods, apparatus and system for speech segmentation are described. In some embodiments, a fuzzy rule may be determined to discriminate a speech segment from a non-speech segment. An antecedent of the fuzzy rule may include an input variable and an input variable membership. A consequent of the fuzzy rule may include an output variable and an output variable membership. An instance of the input variable may be extracted from a segment. An input variable membership function associated with the input variable membership and an output variable membership function associated with the output variable membership may be trained. The instance of the input variable, the input variable membership function, the output variable, and the output variable membership function may be operated, to determine whether the segment is the speech segment or the non-speech segment.
申请公布号 US8775182(B2) 申请公布日期 2014.07.08
申请号 US201313861734 申请日期 2013.04.12
申请人 Intel Corporation 发明人 Du Robert;Tao Ye;Zu Daren
分类号 G10L15/00;G10L15/20 主分类号 G10L15/00
代理机构 Blakely, Sokoloff, Taylor & Zafman LLP 代理人 Blakely, Sokoloff, Taylor & Zafman LLP
主权项 1. A method comprising: performing operations, by a processing device, wherein the operations comprise: applying a fuzzy rule of a plurality of fuzzy rules to a plurality of media segments to determine whether a media segment is a speech segment or a non-speech segment and to discriminate the speech segment from the non-speech segment, wherein the discrimination is performed based on one or more of characteristics of media data, prior knowledge relating to speech data, and speech-likelihood of the media segment, wherein the applying of the fuzzy rule further determines whether the media segment takes one or more forms, wherein at least one of the one or more forms includes an antecedent or a consequent, wherein the antecedent includes one or more input variables indicating one or more characteristics of the media data, and wherein the consequent includes one or more output variables; training membership functions, wherein at least one of the membership functions includes at least one of an input variable membership function and an output variable membership function, wherein the input variable membership function is associated with the one or more input variables, and wherein the output variable membership function is associated with the one or more output variables; defuzzifying a fuzzy conclusion to provide a defuzzified output, wherein the defuzzifying includes finding a centroid of weighted aggregation associated with each output variable, wherein the centroid is used to identify a definite number of the one or more output variables, wherein the identifying is based on the defuzzified output, wherein the defuzzified output includes a speech likelihood of the definite number of the one or more output variables; and labeling the media segment as the speech segment or the non-speech segment based on the speech likelihood of the definite number of the one or more output variables.
地址 Santa Clara CA US