摘要 |
A method and apparatus are provided that identify confused characters in a text written in a language having a large number of distinct characters. To identify the confused characters, a set of characters from the text are segmented into individual characters. A confusable character (332, 342, 344) for at least one of the segmented characters (330, 340) is then retrieved. Lexical information (106) is identified for both the segmented characters (330, 340) and the retrieved confusable characters (332, 342, 344) and is used to parse the segmented characters and the confusable characters. Based on the parse, a segmented character is identified that has been confused with a confusable character. |