摘要 |
Disclosed herein is a method and apparatus to recognize speech by measuring the confidence levels of respective frames. The method includes the operations of obtaining frequency features of a received speech signal for the respective frames having a predetermined length, calculating a keyword model-based likelihood and a filler model-based likelihood for each of the frame, calculating a confidence score based on the two types of likelihoods, and deciding whether the received speech signal corresponds to a keyword or a non-keyword based on the confidence scores. Also, the method includes the operation of transforming the confidence scores by applying transform functions of clusters, which include the confidence scores or are close to the confidence scores, to the confidence scores. |