发明名称 System and method for storing connectivity information in a web database
摘要 A web crawler system includes a central processing unit for performing computations in accordance with stored procedures and a network interface for accessing remotely located computers via a network. A web crawler module downloads pages from remotely located servers via the network interface. A first link processing module obtains page link information from the downloaded page; the page link information includes for each downloaded page a row of page identifiers of other pages. A second link processing module encodes the rows of page identifies in a space efficient manner. It arranges the rows of page identifiers in a particular order. For each respective row it identifies a prior row, if any, that best matches the respective row in accordance with predefined row match criteria, determines a set of deletes representing page identifiers in the identified prior row not in the respective row, and determines a set of adds representing page identifiers in the respective row not in the identifier prior row. The second link processing module delta encodes the set of deletes and delta encodes the set of adds for each respective row, and then Huffman codes the delta encoded set of deletes and delta encoded set of adds for each respective row.
申请公布号 US2002138509(A1) 申请公布日期 2002.09.26
申请号 US20010766336 申请日期 2001.01.18
申请人 BURROWS MICHAEL;RANDALL KEITH H.;STATA RAYMOND P.;WICKREMESINGHE RAJIV G. 发明人 BURROWS MICHAEL;RANDALL KEITH H.;STATA RAYMOND P.;WICKREMESINGHE RAJIV G.
分类号 G06F17/30;(IPC1-7):G06F15/173;G06F17/00 主分类号 G06F17/30
代理机构 代理人
主权项
地址