发明名称 COMPUTER SYSTEM FOR COLLECTING INFORMATION FROM WEB SITES
摘要 <p>Computer processing means and method for searching and retrieving Web pages to collect people and organization information are disclosed. A Web site of potential interest is accessed. A subset of Web pages from the accessed site are determined for processing. According to types of contents found on a subject Web page, extraction of people and organization information is enabled. Internal links of a Web site are collected and recorded in a links-to-visit table. To avoid duplicate processing of Web sites, unique identifiers or Web site signatures are utilized. Respective time thresholds (time-outs) for processing a Web site and for processing a Web page are employed. A database is maintained for storing indications of domain URL's, names of respective owners of the URL's as identified from the corresponding Web sites, type of each Web site, processing frequencies, dates of last processings, outcomes of last processings, size of each domain and number of data items founds in last processing of each Web site.</p>
申请公布号 WO2002010982(A2) 申请公布日期 2002.02.07
申请号 US2001022426 申请日期 2001.07.17
申请人 发明人
分类号 主分类号
代理机构 代理人
主权项
地址