发明名称 SYSTEM AND METHODS FOR WEB RESOURCE DISCOVERY
摘要 The subject invention comprises a system for data mining, preferably comprising a sample generator component (110); a filtering system component (130); and a buffering component. The sample generator component is preferably configured to communicate with a plurality of search engines (120) and to generate queries based on a sample repository of positive and negative sample documents, and comprises a feature extraction algorithm. The subject invention also comprises a method for data mining; comprising the steps of (a) identifying candidate sample documents based on a category (125); (b) filtering candidate documents by applying a categorization model (135); (c) buffering the filtered documents (145); (d) labelling the buffered documents as positive or negative examples of the category (155); (e) retraining the categorization model, based on the labeled set of positive and negative example documents (165); (f) repeating steps (b) and (e) until all candidate documents are processed; and (g) storing all labeled documents in a database.
申请公布号 WO0206993(A1) 申请公布日期 2002.01.24
申请号 WO2001US22350 申请日期 2001.07.17
申请人 ASYMMETRY, INC. 发明人 NEVEITT, WILLIAM, T.
分类号 G06F7/00;G06F17/30;(IPC1-7):G06F17/00 主分类号 G06F7/00
代理机构 代理人
主权项
地址