发明名称 Method and apparatus for efficient segmentation of compound words using probabilistic breakpoint traversal
摘要 A method for segmenting a compound word in an unrestricted natural-language input is disclosed. The method comprises receiving a natural-language input consisting of a plurality of characters. Next, a set of probabilistic breakpoints based on a probabilistic breakpoint analysis is constructed in the natural-language input. A plurality of linkable components is identified by traversal of substrings of the natural-language input delimited by the set of probabilistic breakpoints. Finally, a segmented string consisting of a plurality of linkable components spanning the natural-language input is returned. The segmented string can be interpreted as a compound word.
申请公布号 US7610189(B2) 申请公布日期 2009.10.27
申请号 US20010042528 申请日期 2001.10.18
申请人 NUANCE COMMUNICATIONS, INC. 发明人 MACKIE ANDREW WILLIAM
分类号 G06F17/28;G06F17/27 主分类号 G06F17/28
代理机构 代理人
主权项
地址