发明名称 Identifying language and character set of data representing text
摘要 The present invention provides a facility for identifying the unknown language of text represented by a series of data values in accordance with a character set that associates character glyphs with particular data values. The facility first generates a characterization that characterizes the series of data values in terms of the occurrence of particular data values on the series of data values. For each of a plurality of languages, the facility then retrieves a model that models the language in terms of the statistical occurrence of particular data values in representative samples of text in that language. The facility then compares the retrieved models to the generated characterization of the series of data values, and identifies as the distinguished language the language whose model compares most favorably to the generated characterization of the series of data values.
申请公布号 US6157905(A) 申请公布日期 2000.12.05
申请号 US19970987565 申请日期 1997.12.11
申请人 MICROSOFT CORPORATION 发明人 POWELL, ROBERT DAVID
分类号 G06F17/22;G06F17/27;(IPC1-7):G06F17/30 主分类号 G06F17/22
代理机构 代理人
主权项
地址
您可能感兴趣的专利