Bootstrapping language models for spoken dialog systems using the world wide web,申请号US200611425243-传众专利搜索

发明名称	Bootstrapping language models for spoken dialog systems using the world wide web
摘要	A system, method and computer readable medium that generates a language model from data from a web domain is disclosed. The method may include filtering web data to remove unwanted data from the web domain data, extracting predicate/argument pairs from the filtered web data, generating conversational utterances by merging the extracted predicate/argument pairs into conversational templates, and generating a web data language model using the generated conversational utterances.
申请公布号	US9299345(B1)	申请公布日期	2016.03.29
申请号	US200611425243	申请日期	2006.06.20
申请人	AT&T Intellectual Property II, L.P.	发明人	Gilbert Mazin;Hakkani-Tur Dilek Z.
分类号	G10L15/00;G10L15/14;G10L15/22;G10L15/30	主分类号	G10L15/00
代理机构		代理人
主权项	1. A method comprising: identifying, via a processor communicating with Internet resources, common task independent web-sentences based on frequently occurring phrases across multiple websites from a web domain stored in a data store; selectively removing the common task independent web-sentences from the web domain data, to yield filtered web domain data comprising domain-specific data; identifying, via the processor, predicate/argument pairs from the filtered web domain data; replacing, via the processor, the predicate/argument pairs with predicate/argument tokens; generating, via the processor, conversational utterances by merging the predicate/argument tokens with manually written conversational templates while preserving a relative frequency of the manually written conversational templates, to yield generated conversational utterances; and generating, via the processor, a web data language model using the generated conversational utterances, and providing it as an initial language model for deployment of an automated speech recognition system.
地址	Atlanta GA US