发明名称 Method of feature extraction from noisy documents
摘要 Aspect of the exemplary embodiment relate to a method and apparatus for automatically identifying features that are suitable for use by a classifier in assigning class labels to text sequences extracted from noisy documents. The exemplary method includes receiving a dataset of text sequences, automatically identifying a set of patterns in the text sequences, and filtering the patterns to generate a set of features. The filtering includes at least one of filtering out redundant patterns and filtering out irrelevant patterns. The method further includes outputting at least some of the features in the set of features, optionally after fusing features which are determined not to affect the classifiers accuracy if they are merged.
申请公布号 US8655803(B2) 申请公布日期 2014.02.18
申请号 US20080336872 申请日期 2008.12.17
申请人 LECERF LOIC;CHIDLOVSKII BORIS;XEROX CORPORATION 发明人 LECERF LOIC;CHIDLOVSKII BORIS
分类号 G06F15/18 主分类号 G06F15/18
代理机构 代理人
主权项
地址