发明名称 Stemming for searching
摘要 Embodiments of a search stemming module at a computer configured to receive a query and identify stems from the query and configured to search a secondary index for variants corresponding to the stems are disclosed. The secondary index may comprise one or more lists of stems associated with variants from a primary index; a search reconfiguration module configured to reformat the query to include variants found from the secondary index; and a search engine configured to implement a search of the primary index using the variants received from the search reconfiguration module.
申请公布号 US9600588(B1) 申请公布日期 2017.03.21
申请号 US201313788570 申请日期 2013.03.07
申请人 INTERNATIONAL BUSINESS MACHINES CORPORATION 发明人 Kalter William;Scott Eric;Szalay Jozsef
分类号 G06F17/30 主分类号 G06F17/30
代理机构 Edell, Shapiro & Finnan, LLC 代理人 Tham Yeen;Edell, Shapiro & Finnan, LLC
主权项 1. A system for performing stem searches on a set of documents, comprising: a processor comprising: an indexing module configured to extract terms from the set of documents and create a primary index including each of the extracted terms for searching the set of documents wherein the primary index comprises an inverted index;a stemming module configured to generate a secondary index from the extracted terms for searching the primary index, wherein the secondary index includes each stem of the extracted terms and each corresponding variant of that stem within the extracted terms from the set of documents that is different than the stem, wherein the secondary index comprises an ordered lookup table of stems and variants, and wherein the stemming module is further configured to generate the secondary index by identifying stems from the extracted terms and, if a term is found whose stem is different than the term, adding the stem with the term to the table if the stem has not already been added and, if the stem has been added but the term has not, adding the term;a search stemming module configured to receive a query for the set of documents and identify stems of terms from the query and to search the secondary index for a plurality of variants corresponding to the stems of the terms from the query, wherein the plurality of variants found in the search is different than the stems from the terms in the query;a search reconfiguration module configured to reformat the query to include a plurality of additional query terms comprising the plurality of variants found from the secondary index, wherein, for each term of the query having a variant in the secondary index, the reformatted query includes the term, the identified stem of the term, and each corresponding variant for the identified stem of the term found from the secondary index; anda search engine configured to implement a search of the primary index using the reformatted query received from the search reconfiguration module to identify a resulting set of documents.
地址 Armonk NY US