发明名称 Method and apparatus for automatic detection of spelling errors in one or more documents
摘要 Methods and apparatus are provided for automatically detecting spelling errors in one or more documents, such as documents being processed for the creation of a lexicon According to one aspect of the invention, a spelling error is detected in one or more documents by determining if at least one given word in the one or more documents satisfies a predefined misspelling criteria, wherein the predefined misspelling criteria comprises the at least one given word having a frequency below a predefined low threshold and the at least one given word being within a predefined edit distance of one or more other words in the one or more documents having a frequency above a predefined high threshold; and identifying a given word as a potentially misspelled word if the given word satisfies the predefined misspelling criteria.
申请公布号 US9465791(B2) 申请公布日期 2016.10.11
申请号 US200711673173 申请日期 2007.02.09
申请人 International Business Machines Corporation 发明人 Gail H. Richard;Hantler Sidney L.;Laker Meir M.;Lenchner Jonathan;Milch Daniel
分类号 G06F17/27 主分类号 G06F17/27
代理机构 Ryan, Mason & Lewis, LLP 代理人 Ryan, Mason & Lewis, LLP
主权项 1. A method for detecting a spelling error in one or more documents, comprising: obtaining a maximum edit distance at which a word, w, is to be considered a possible misspelling of another word, w′; determining if at least one given word in said one or more documents satisfies a predefined misspelling criteria, wherein said predefined misspelling criteria comprises said at least one given word having a frequency below a predefined low threshold and said at least one given word being within the obtained maximum edit distance of one or more other words in said one or more documents having a frequency above a predefined high threshold; identifying a given word as a potentially misspelled word if said given word satisfies said predefined misspelling criteria; and maintaining a lexicon such that said lexicon will include said given word if said given word does not satisfy said predefined misspelling criteria and will exclude said given word if said given word satisfies said predefined misspelling criteria, wherein one or more of said steps are performed by a processor.
地址 Armonk NY US