发明名称 Data harvesting method apparatus and system
摘要 A method, apparatus, and system are disclosed for harvesting publicly accessible data from internet web pages. In one embodiment, the invention includes emulating user requests that are consistent with a user operating an industry standard browser, receiving text in response to the generated request, using a set of relevance estimators to select a most relevant candidate from a set of data items, and segmenting text received from a web page into extractable blocks. Relevance estimators may use techniques such as word matching, pattern matching, format matching, context assessment, word proximity, and the like. The extracted data may be aggregated into a database and used in applications such as phone directories or sales catalogs. The present invention facilitates data harvesting from web pages related to one or more specified topics.
申请公布号 US2005192948(A1) 申请公布日期 2005.09.01
申请号 US20050049041 申请日期 2005.02.02
申请人 MILLER JOSHUA J.;PUGINA MARCIO 发明人 MILLER JOSHUA J.;PUGINA MARCIO
分类号 G06F17/30;(IPC1-7):G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址