发明名称 Simplifying lexicon creation in hybrid duplicate detection and inductive classifier systems
摘要 A classification system includes a signature-based duplicate detector and an inductive classifier that share attribute information. To perform the duplicate detection and the classification, the duplicate detector and inductive classifier are first initialized by generating a lexicon of attributes for the duplicate detector and a classification model for the classifier. To develop a classification model, a training set of documents of known class are used by the classifier to determine the attributes of the documents that are most useful in classifying an unknown document. The model is developed from these attributes. Attribute information containing the attributes determined by the classifier is then passed to the duplicate detector and the duplicate detector uses the attribute information to generate the lexicon of attributes.
申请公布号 US7725475(B1) 申请公布日期 2010.05.25
申请号 US20040016930 申请日期 2004.12.21
申请人 AOL INC. 发明人 ALSPECTOR JOSHUA;KOLCZ ALEKSANDER;CHOWDHURY ABDUR R.
分类号 G06F7/00 主分类号 G06F7/00
代理机构 代理人
主权项
地址