发明名称 METHOD AND APPARATUS FOR IDENTIFYING ERRONEOUS CHARACTERS IN TEXT
摘要 A method and apparatus are provided that identify confused characters in a text written in a language having a large number of distinct characters. To identify the confused characters, a set of characters from the text are segmented into individual characters. A confusable character (332, 342, 344) for at least one of the segmented characters (330, 340) is then retrieved. Lexical information (106) is identified for both the segmented characters (330, 340) and the retrieved confusable characters (332, 342, 344) and is used to parse the segmented characters and the confusable characters. Based on the parse, a segmented character is identified that has been confused with a confusable character.
申请公布号 WO0129696(A1) 申请公布日期 2001.04.26
申请号 WO2000US41218 申请日期 2000.10.18
申请人 MICROSOFT CORPORATION 发明人 WU, ANDI;HEIDORN, GEORGE, E.
分类号 G10L15/28;G06F17/21;G06F17/27;G10L15/18 主分类号 G10L15/28
代理机构 代理人
主权项
地址