主权项 |
1. A system for performing stem searches on a set of documents, comprising:
a processor comprising:
an indexing module configured to extract terms from the set of documents and create a primary index including each of the extracted terms for searching the set of documents wherein the primary index comprises an inverted index;a stemming module configured to generate a secondary index from the extracted terms for searching the primary index, wherein the secondary index includes each stem of the extracted terms and each corresponding variant of that stem within the extracted terms from the set of documents that is different than the stem, wherein the secondary index comprises an ordered lookup table of stems and variants, and wherein the stemming module is further configured to generate the secondary index by identifying stems from the extracted terms and, if a term is found whose stem is different than the term, adding the stem with the term to the table if the stem has not already been added and, if the stem has been added but the term has not, adding the term;a search stemming module configured to receive a query for the set of documents and identify stems of terms from the query and to search the secondary index for a plurality of variants corresponding to the stems of the terms from the query, wherein the plurality of variants found in the search is different than the stems from the terms in the query;a search reconfiguration module configured to reformat the query to include a plurality of additional query terms comprising the plurality of variants found from the secondary index, wherein, for each term of the query having a variant in the secondary index, the reformatted query includes the term, the identified stem of the term, and each corresponding variant for the identified stem of the term found from the secondary index; anda search engine configured to implement a search of the primary index using the reformatted query received from the search reconfiguration module to identify a resulting set of documents. |