发明名称 SYSTEM AND METHOD FOR SELECTING TRAINING TEXT
摘要 A system and method are described for determining a near-optimum subset of data, based on a selected model, from a large corpus of data. Sets of feature vectors corresponding to natural or other preselected divisions of the data corpus are mapped into matrices representative of such divisions. The invention operates to find a submatrix of full rank formed as a union of one or more of those division-based matrices. A greedy algorithm utilizing Gram-Schmidt orthonormalization operates on the division matrices to find a near optimum submatrix and in a time bound representing a substantial improvement over prior-art methods. An important application of the invention is the selection of a small number of sentences from a corpus of a very large number of such sentences from which the parameters of a duration model for speech synthesis can be estimated.
申请公布号 CA2177863(A1) 申请公布日期 1997.01.08
申请号 CA19962177863 申请日期 1996.05.31
申请人 AT&T IPM CORP. 发明人 BUCHSBAUM, ADAM LOUIS;VANSANTEN, JAN PIETER
分类号 G10L13/02;G10L13/08;(IPC1-7):G06F17/16;G10L9/00 主分类号 G10L13/02
代理机构 代理人
主权项
地址