发明名称 |
Tokenizer for a natural language processing system |
摘要 |
The present invention is a segmenter used in a natural language processing system. The segmenter segments a textual input string into tokens for further natural language processing. In accordance with one feature of the invention, the segmenter includes a tokenizer engine that proposes segmentations and submits them to a linguistic knowledge component for validation. In accordance with another feature of the invention, the segmentation system includes language-specific data that contains a precedence hierarchy for punctuation. If proposed tokens in the input string contain punctuation, they can illustratively be broken into subtokens based on the precedence hierarchy.
|
申请公布号 |
US2003023425(A1) |
申请公布日期 |
2003.01.30 |
申请号 |
US20010822976 |
申请日期 |
2001.03.30 |
申请人 |
PENTHEROUDAKIS JOSEPH E.;BRADLEE DAVID G.;KNOLL SONJA S. |
发明人 |
PENTHEROUDAKIS JOSEPH E.;BRADLEE DAVID G.;KNOLL SONJA S. |
分类号 |
G06F17/27;(IPC1-7):G06F17/27 |
主分类号 |
G06F17/27 |
代理机构 |
|
代理人 |
|
主权项 |
|
地址 |
|