发明名称 |
System and method for combining frame and segment level processing, via temporal pooling, for phonetic classification |
摘要 |
Disclosed herein are systems, methods, and non-transitory computer-readable storage media for combining frame and segment level processing, via temporal pooling, for phonetic classification. A frame processor unit receives an input and extracts the time-dependent features from the input. A plurality of pooling interface units generates a plurality of feature vectors based on pooling the time-dependent features and selecting a plurality of time-dependent features according to a plurality of selection strategies. Next, a plurality of segmental classification units generates scores for the feature vectors. Each segmental classification unit (SCU) can be dedicated to a specific pooling interface unit (PIU) to form a PIU-SCU combination. Multiple PIU-SCU combinations can be further combined to form an ensemble of combinations, and the ensemble can be diversified by varying the pooling operations used by the PIU-SCU combinations. Based on the scores, the plurality of segmental classification units selects a class label and returns a result. |
申请公布号 |
US9208778(B2) |
申请公布日期 |
2015.12.08 |
申请号 |
US201414537400 |
申请日期 |
2014.11.10 |
申请人 |
AT&T Intellectual Property I, L.P. |
发明人 |
Chopra Sumit;Dimitriadis Dimitrios;Haffner Patrick |
分类号 |
G10L15/08;G10L15/02;G10L15/16 |
主分类号 |
G10L15/08 |
代理机构 |
|
代理人 |
|
主权项 |
1. A method comprising:
extracting time-dependent features from an input, to yield extracted time-dependent features; selecting a plurality of time-dependent features from the extracted time-dependent features using a plurality of selection strategies, wherein a plurality of pooling interface units select the plurality of time-dependent features based on a weighted average score and on a rectified average score of the extracted time-dependent features; generating a plurality of feature vectors by pooling the plurality of time-dependent features using a plurality of pooling interface units; generating a plurality of scores associated with the plurality of feature vectors; and returning, in response to the input, a class label selected based on the plurality of scores. |
地址 |
Atlanta GA US |