摘要 |
A system (100) is capable of segmenting a connected text, such as Japanese or Chinese sentence, into words. The system includes means (110) for reading an input string representing the connected text. Segmentation means (120) identifies at least one word sequence in the connected text by building a tree structure representing word sequence(s) in the input string in an iterative manner. Initially the input string is taken as a working string. Each word of a dictionary (122) is compared with the beginning of the working string. A match is represented by a node in the tree, and the process is continued with the remaining part of the input string. The system further includes means (130) for outputting at least one of the identified word sequences. A language model may be used to select between candidate sequences. Preferably the system is used in a speech recognition system to update the lexicon based on representative texts. |