发明名称 SYSTEM FOR TOKENIZING TEXT IN LANGUAGES WITHOUT INTER-WORD SEPARATION
摘要 A computerized system for transforming an input string includes a dictionary with tokens and associated scores. A chart parser generates a chart parse of the input string by, for each position within the input string, (i) identifying a string of at least one consecutive character in the input string that begins at that position and matches one of the tokens and (ii) unless the identified string is a single character matching the start character for another entry in the chart parse, creating an entry corresponding to the identified string. A partition selection module determines a selected partition of the input string. The selected partition includes an array of tokens selected from the chart parse such that their concatenation matches the input string. The selected partition is a minimum score partition, where the score is based on a sum of the tokens' associated scores from the dictionary.
申请公布号 WO2017042744(A1) 申请公布日期 2017.03.16
申请号 WO2016IB55405 申请日期 2016.09.09
申请人 QUIXEY, INC. 发明人 WANG, Yifu;GLOVER, Eric J.;ZHANG, Chen;CHEN, Zhaohui;DAI, Xueying
分类号 G06F17/30;G06F17/27 主分类号 G06F17/30
代理机构 代理人
主权项
地址