发明名称 |
IMPLEMENTATION OF UNSUPERVISED TOPIC SEGMENTATION IN A DATA COMMUNICATIONS ENVIRONMENT |
摘要 |
A method is provided in one example embodiment and includes extracting sentences from data, which comprises a speech transcript; tokenizing the plurality of sentences to develop for each of the plurality of sentences a sentence vector and at least one feature vector; and performing topic segmentation on the speech transcript using the sentence vectors and feature vectors, the topic segmentation resulting in a listing of segments corresponding to the speech transcript. In certain embodiments, the feature vector may be at least one of a cue word feature vector, a speaker change feature vector, and a scene change feature vector. |
申请公布号 |
US2014214402(A1) |
申请公布日期 |
2014.07.31 |
申请号 |
US201313750049 |
申请日期 |
2013.01.25 |
申请人 |
Diao Qian;Gadde Venkata Ramana Rao |
发明人 |
Diao Qian;Gadde Venkata Ramana Rao |
分类号 |
G06F17/21 |
主分类号 |
G06F17/21 |
代理机构 |
|
代理人 |
|
主权项 |
1. A method, comprising:
extracting a plurality of sentences from data, which comprises a speech transcript; tokenizing the plurality of sentences to develop for each of the plurality of sentences a sentence vector and at least one feature vector; and performing topic segmentation on the speech transcript using the sentence vectors and feature vectors, wherein the topic segmentation is to result in a listing of segments corresponding to the speech transcript. |
地址 |
San Jose CA US |