发明名称 Determining a natural language shift in a computer document
摘要 Language shift points in a computer document written in a plurality of natural languages are determined. An interval is defined on and moved through a text document in a computer memory, the interval contains a portion of the text in the document. As the interval is moved through the document for each position of the interval, a probability that the text in the interval is written in each of a plurality of candidate languages is determined for the position. For the first position of the interval, generally the beginning of the document, a first candidate language is classified as the current language if it has the highest probability of all the candidate languages within the interval. A language shift point in the document is identified where the relative probability of a second candidate language is higher than the current language at a new position of the interval. At this point, the second candidate language is classified as the current language in the document after the language shift point. The process continues to identify other language shift points in the document.
申请公布号 US5913185(A) 申请公布日期 1999.06.15
申请号 US19960772213 申请日期 1996.12.20
申请人 INTERNATIONAL BUSINESS MACHINES CORPORATION 发明人 MARTINO, MICHAEL JOHN;PAULSEN, JR., ROBERT CHARLES
分类号 G06F17/27;G06F17/28;(IPC1-7):G06F17/27;G06F17/21 主分类号 G06F17/27
代理机构 代理人
主权项
地址