发明名称 Automatic segmentation of texts comprising chunks without separators
摘要 Syntagms of a text including individual elements written without separators are segmented into chunks having strings including at least one individual element, such as an ideogram of the Mandarin Chinese language. A lexicon is defined including a set of strings, each string having at least one of the individual elements. The syntagm, being segmented, is orderly searched on an element-by-element basis by searching within the lexicon strings corresponding to any of the chunks. In the case of a positive search result, the corresponding chunk located is stored with an associated cost. A check is made as to whether the chunk located was already present in the lexicon. If the chunk located was already present, the cost associated therewith is reduced. A plurality of candidate segmentation sequences are thus generated, each corresponding to a respective segmentation pattern having associated a corresponding accrued cost. The candidate sequence having the lowest associated cost is selected as the final result of segmentation.
申请公布号 US2007118356(A1) 申请公布日期 2007.05.24
申请号 US20030556940 申请日期 2003.05.28
申请人 BADINO LEONARDO 发明人 BADINO LEONARDO
分类号 G06F17/21;G06F17/27 主分类号 G06F17/21
代理机构 代理人
主权项
地址