摘要 |
Indexing, searching, and retrieving the content of speech documents (including but not limited to recorded books, audio broadcasts, recorded conversations) is accomplished by finding and retrieving speech documents that are related to a query term at a conceptual level, even if the speech documents does not contain the spoken (or textual) query terms. Concept-based cross-media information retrieval is used. A term-phoneme/document matrix is constructed from a training set of documents. Documents are then added to the matrix constructed from the training data. Singular Value Decomposition is used to compute a vector space from the term-phoneme/document matrix. The result is a lower-dimensional numerical space where term-phoneme and document vectors are related conceptually as nearest neighbors. A query engine computes a cosine value between the query vector and all other vectors in the space and returns a list of those term-phonemes and/or documents with the highest cosine value. |
申请人 |
TELCORDIA TECHNOLOGIES, INC.;BEHRENS, CLIFFORD, A.;EGAN, DENNIS;BASSU, DEVASIS |
发明人 |
BEHRENS, CLIFFORD, A.;EGAN, DENNIS;BASSU, DEVASIS |