发明名称 System and method for tokenization of text using classifier models
摘要 The present invention pertains to a system and method for the tokenization of text. The featurizer may be configured to receive input text and convert the input text into tokens. According to one aspect of the invention, the tokens may include only one type of character, the characters selected from the group consisting of letters, numbers, and punctuation. The tokenizer may also include a classifier. The classifier may be configured to receive the tokens from the featurizer. Furthermore, the classifier may be configured to analyze the tokens received from the featurizer to determine if the tokens may be input into a predetermined classification model using a preclassifier. If one of the tokens passes the preclassifier, then the token is classified using the predetermined classification model. Additionally, according to a first aspect of the invention, the tokenizer may also include a finalizer. The finalizer may be configured to receive the tokens and may be configured to produce a final output.
申请公布号 US7937263(B2) 申请公布日期 2011.05.03
申请号 US20040001654 申请日期 2004.12.01
申请人 DICTAPHONE CORPORATION 发明人 CARRIER JILL;CARUS ALWIN B.;COTE WILLIAM F.;DOWD JOHN;DEL LA FEMINA KATHRYN;FRANKEL ALAN;HAN WENSHENG(VINCENT);LAPSHINA LARISSA;RECHEA BERNARDO;SANTISTEBAN ANA;UHRBACH AMY J.
分类号 G06F17/27;G06F17/20 主分类号 G06F17/27
代理机构 代理人
主权项
地址