摘要 |
Systems and methods are disclosed for indexing a document such as a webpage that includes one or more misspelled words based on an index classification of the document. Generally, a document is received and it is determined whether a word in the document is spelled incorrectly. If the word in the document is spelled incorrectly, a first set of candidate words and a confidence score associated with each of the first set of candidate words is generated based on whether the word is a common misspelling or a culture-based misspelling of the word. Based on one or more index classifications of the document, a second set of one or more candidate words, which is a subset of the first set of candidate words, and a confidence score associated with each of the second set of one or more candidate words is generated. The received document is then indexed with at least one word of the second set of candidate words. The document may also be indexed with the actual spelling of the word in the document.
|