Method for language-independent text tokenization using a character categorization.
摘要
<p>A computer method is disclosed to isolate linguistically salient strings ("words") from a natural language text stream. The process is applicable to a variety of computer hardware, to any character encoding scheme, and to the idiosyncrasies of most natural languages.</p>
申请公布号
EP0394633(A2)
申请公布日期
1990.10.31
申请号
EP19900103323
申请日期
1990.02.21
申请人
INTERNATIONAL BUSINESS MACHINES CORPORATION
发明人
FAGAN, JOEL LA VERNE;GUNTHER, MICHAEL DANIEL;OVER, PAUL DOUGLAS;PASSON, GREG;TSAO, CHIEN CHUN;ZAMORA, ANTONIO