发明名称 System and method for providing speech recognition using personal vocabulary in a network environment
摘要 A method is provided in one example and includes receiving a media file and generating a text file based on the media file. The method includes identifying selected words within the text file based on a whitelist, the whitelist includes a plurality of designated words to be tagged. The selected words are compared to a group of words associated with an individual. One or more of the selected words are removed based on the selected words not being found in the group of words associated with the individual. In more specific embodiments, the method includes generating a resultant after removing one or more of the selected words, the resultant can be separated into fields that identify a title and an author associated with the resultant. At least one of the selected words that is removed is associated with a false positive associated with two words that phonetically sound similar.
申请公布号 US9201965(B1) 申请公布日期 2015.12.01
申请号 US200912571414 申请日期 2009.09.30
申请人 CISCO TECHNOLOGY, INC. 发明人 Gannu Satish K.;Jouret Guido;Malegaonkar Ashutosh A.
分类号 G10L15/26;G06F17/30 主分类号 G10L15/26
代理机构 Patent Capital Group 代理人 Patent Capital Group
主权项 1. A method, comprising: receiving data propagating in a network environment; ignoring Joint Photographic Experts Group (JPEG) documents in the data; identifying an audio and video media file in the data, wherein the audio and video media file is associated with a plurality of individuals; generating a text file based on the audio and video media file; comparing the text file to a plurality of blacklisted words; dropping the text file if a blacklisted word is found in the text file; identifying, using a processor, selected words within the text file based on a whitelist to create a first word list, wherein the first word list includes fewer words than the text file; comparing the selected words in the first word list to a personal vocabulary database associated with an individual from the plurality of individuals, wherein the personal vocabulary database associated with the individual includes one or more words that the individual added to the personal vocabulary database, and wherein words in the personal vocabulary database associated with the individual may be marked as private; and removing from the first word list, one or more of the selected words to create a second word list based on the selected words not being found in the personal vocabulary database associated with the individual, wherein the second word list includes fewer words then the first word list, wherein at least one of the selected words that is removed is associated with a false positive from two words that phonetically sound similar.
地址 San Jose CA US