发明名称 |
Method of identifying the language of a textual passage using short word and/or n-gram comparisons |
摘要 |
A method and system identifying the language of a textual passage is disclosed. The method and system includes parsing the textual passage into n-grams and assigning an initial weight to each n-gram, and adjusting the weight initially assigned to a word or n-gram parsed from the textual passage. The initially assigned weight is adjusted in a manner proportionate to the inverse of the number of languages within which such words or n-grams appear. Reducing the weight assigned to such words or n-grams diminishes-without completely eliminating-their importance in comparison to other words or n-grams parsed from the same textual passage when determining the language of a passage. The method and system of the present invention appropriately weighs the short words or n-grams common to multiple languages without affecting the short words or n-grams that are uncommon to several languages.
|
申请公布号 |
US2005154578(A1) |
申请公布日期 |
2005.07.14 |
申请号 |
US20040757313 |
申请日期 |
2004.01.14 |
申请人 |
TONG XIANG;GREFENSTETTE GREGORY T.;EVANS DAVID A. |
发明人 |
TONG XIANG;GREFENSTETTE GREGORY T.;EVANS DAVID A. |
分类号 |
G06F17/21;G06F17/22;G06F17/27;G06F17/28;(IPC1-7):G06F17/28 |
主分类号 |
G06F17/21 |
代理机构 |
|
代理人 |
|
主权项 |
|
地址 |
|