发明名称 System and method for indexing a document that includes a misspelled word
摘要 Systems and methods are disclosed for indexing a document such as a webpage that includes one or more misspelled words based on an index classification of the document. Generally, a document is received and it is determined whether a word in the document is spelled incorrectly. If the word in the document is spelled incorrectly, a first set of candidate words and a confidence score associated with each of the first set of candidate words is generated based on whether the word is a common misspelling or a culture-based misspelling of the word. Based on one or more index classifications of the document, a second set of one or more candidate words, which is a subset of the first set of candidate words, and a confidence score associated with each of the second set of one or more candidate words is generated. The received document is then indexed with at least one word of the second set of candidate words. The document may also be indexed with the actual spelling of the word in the document.
申请公布号 US2008155399(A1) 申请公布日期 2008.06.26
申请号 US20060642476 申请日期 2006.12.20
申请人 YAHOO! INC. 发明人 KOCK AMBLES
分类号 G06F17/00;G06F7/00 主分类号 G06F17/00
代理机构 代理人
主权项
地址