发明名称 Method and apparatus for generation and augmentation of search terms from external and internal sources
摘要 A method and apparatus to identify names, personalities, titles, and topics that are present in a repository and to identify names, personalities, titles, and topics that are not present in the repository, uses information from external data sources, notably the text used in non-speech, text-based searches, to expand the search terms. The expansion takes place in two forms: (1) finding plausible linguistic variants of existing search terms that are already comprehended in the repository, but that are present under slightly different names; and (2) expanding the existing search term list with items that should be there by virtue of their currency in popular culture, but which for whatever reason have not yet been reflected with content items in the repository.
申请公布号 US9305549(B2) 申请公布日期 2016.04.05
申请号 US201414513093 申请日期 2014.10.13
申请人 PROMPTU SYSTEMS CORPORATION 发明人 Stampleman Joseph Bruce;Printz Harry
分类号 G10L15/00;G10L15/22;G06Q30/02;G10L17/26;G10L15/18;G06F17/30 主分类号 G10L15/00
代理机构 Perkins Coie LLP 代理人 Glenn Michael A.;Perkins Coie LLP
主权项 1. An apparatus for identifying names, personalities, titles, and topics, whether or not said names, personalities, titles and topics are present in a given repository, comprising: a plurality of external data sources, comprising non-speech, unstructured published content, said sources at least including sources selected from among published lists of the text of frequent searches presented to popular text-based search engines, published lists of popular artists and song titles, published lists of most popular tags, published lists of most-emailed stories, and published news feeds; a processor configured for extracting search term candidates from said external sources, the step of extracting further comprising: an automatic extraction means selected from among: named entity extraction (NEE);topic detection and tracking (TDT);direct human intervention; anda combination of NEE, TDT, and direct human intervention; said processor configured for expanding search terms to be provided to any of an automatic speech recognition or natural language processing system, or any combination thereof, using information from said external data sources, said expanding search terms comprising matching candidate search terms against verified search terms by applying linguistic edit distance techniques to obtain plausible linguistic variants of verified search terms and further comprising: said processor configured for finding plausible linguistic variants of existing search terms that are already comprehended within any of said automatic speech recognition or natural language understanding systems; andsaid processor configured for using said external sources to identify search terms that should be in an existing search term list by virtue of their currency in popular culture, but which have not yet been included among content items in the repository; said processor configured for expanding said existing search term list with said identified items;said processor configured for using said linguistic variants to generate augmented verified search terms;said processor configured for establishing a set of null search terms comprising candidate search terms having a high incidence count in said historical database of candidate search terms and in said historical database of verified search terms; andsaid processor configured for adding said augmented verified search terms or said set of null search terms, or any combination thereof, to any of an automatic speech recognition or natural language processing system, or any combination thereof.
地址 Menlo Park CA US