发明名称 Identifying misrepresented characters in strings of text
摘要 Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for identifying misrepresented characters in strings of text. A computer system receives text that includes characters identified as being encoded in UTF-8. The characters are represented as code point values, each code point value representing one character in the text. The computer system makes a determination that the text likely includes characters incorrectly converted from Win-1252 to UTF-8 by comparing the code point values that represent the text with test values. Based on the comparison, the computer system identifies sequences of characters in the text that was likely incorrectly converted.
申请公布号 US8228215(B1) 申请公布日期 2012.07.24
申请号 US20100825659 申请日期 2010.06.29
申请人 RUNGE NORBERT;GOOGLE INC. 发明人 RUNGE NORBERT
分类号 H03M7/00 主分类号 H03M7/00
代理机构 代理人
主权项
地址