摘要 |
Method and apparatus for generating features for use in speech recognition, said method comprising calculating the log frame energy value of each of a predetermined number n of frames of an input speech signal; and applying a matrix transform to the n log frame energy values to form a temporal matrix representing the input speech signal. The matrix transform may be a discrete cosine transform.
|