发明名称 Tokenizer for a natural language processing system
摘要 The present invention is a segmenter used in a natural language processing system. The segmenter segments a textual input string into tokens for further natural language processing. In accordance with one feature of the invention, the segmenter includes a tokenizer engine that proposes segmentations and submits them to a linguistic knowledge component for validation. In accordance with another feature of the invention, the segmentation system includes language-specific data that contains a precedence hierarchy for punctuation. If proposed tokens in the input string contain punctuation, they can illustratively be broken into subtokens based on the precedence hierarchy.
申请公布号 US2003023425(A1) 申请公布日期 2003.01.30
申请号 US20010822976 申请日期 2001.03.30
申请人 PENTHEROUDAKIS JOSEPH E.;BRADLEE DAVID G.;KNOLL SONJA S. 发明人 PENTHEROUDAKIS JOSEPH E.;BRADLEE DAVID G.;KNOLL SONJA S.
分类号 G06F17/27;(IPC1-7):G06F17/27 主分类号 G06F17/27
代理机构 代理人
主权项
地址