发明名称 Method for segmenting communication transcripts using unsupervised and semi-supervised techniques
摘要 A method is provided for forming discrete segment clusters of one or more sequential sentences from a corpus of communication transcripts of transactional communications that comprises dividing the communication transcripts of the corpus into a first set of sentences spoken by a caller and a second set of sentences spoken by a responder; generating a set of sentence clusters by grouping the first and second sets of sentences according to a measure of lexical similarity using an unsupervised partitional clustering method; generating a collection of sequences of sentence types by assigning a distinct sentence type to each sentence cluster and representing each sentence of each communication transcript of the corpus with the sentence type assigned to the sentence cluster into which the sentence is grouped; and generating a specified number of discrete segment clusters by successively merging sentence clusters according to a proximity-based measure between the sentence types assigned to the sentence clusters within sequences of the collection.
申请公布号 US7912714(B2) 申请公布日期 2011.03.22
申请号 US20080060469 申请日期 2008.04.01
申请人 NUANCE COMMUNICATIONS, INC. 发明人 KUMMAMURU KRISHNA;PADMANABAN DEEPAK S.;ROY SHOURYA;SUBRAMANIAM L. VENKATA
分类号 G10L15/06 主分类号 G10L15/06
代理机构 代理人
主权项
地址
您可能感兴趣的专利