发明名称 System and method for combining frame and segment level processing, via temporal pooling, for phonetic classification
摘要 Disclosed herein are systems, methods, and non-transitory computer-readable storage media for combining frame and segment level processing, via temporal pooling, for phonetic classification. A frame processor unit receives an input and extracts the time-dependent features from the input. A plurality of pooling interface units generates a plurality of feature vectors based on pooling the time-dependent features and selecting a plurality of time-dependent features according to a plurality of selection strategies. Next, a plurality of segmental classification units generates scores for the feature vectors. Each segmental classification unit (SCU) can be dedicated to a specific pooling interface unit (PIU) to form a PIU-SCU combination. Multiple PIU-SCU combinations can be further combined to form an ensemble of combinations, and the ensemble can be diversified by varying the pooling operations used by the PIU-SCU combinations. Based on the scores, the plurality of segmental classification units selects a class label and returns a result.
申请公布号 US9208778(B2) 申请公布日期 2015.12.08
申请号 US201414537400 申请日期 2014.11.10
申请人 AT&T Intellectual Property I, L.P. 发明人 Chopra Sumit;Dimitriadis Dimitrios;Haffner Patrick
分类号 G10L15/08;G10L15/02;G10L15/16 主分类号 G10L15/08
代理机构 代理人
主权项 1. A method comprising: extracting time-dependent features from an input, to yield extracted time-dependent features; selecting a plurality of time-dependent features from the extracted time-dependent features using a plurality of selection strategies, wherein a plurality of pooling interface units select the plurality of time-dependent features based on a weighted average score and on a rectified average score of the extracted time-dependent features; generating a plurality of feature vectors by pooling the plurality of time-dependent features using a plurality of pooling interface units; generating a plurality of scores associated with the plurality of feature vectors; and returning, in response to the input, a class label selected based on the plurality of scores.
地址 Atlanta GA US