发明名称 IDENTIFYING UNVISITED PORTIONS OF VISITED INFORMATION
摘要 An illustrative embodiment for identifying unvisited portions of visited information to visit, receives information to crawl, wherein the information is representative of one of web based information and non-web based information, computes a locality sensitive hash (LSH) value for the received information and identifies a most similar information visited thus far. The illustrative embodiment determines whether the LSH of the received information is equivalent to most similar information visited thus far and responsive to a determination that the LSH of the received information is not equivalent to most similar information visited thus far, identifies a visited portion of the received information using information for most similar information visited thus far and crawls only unvisited portions of the received information.
申请公布号 CA2779235(A1) 申请公布日期 2013.12.06
申请号 CA20122779235 申请日期 2012.06.06
申请人 IBM CANADA LIMITED - IBM CANADA LIMITEE 发明人 ISLAM, OBIDUL;ONUT, IOSIF VIOREL;IONESCU, PAUL;KONDRATOVA, EUGENIA
分类号 H04L12/26;G06F7/00;G06F17/00 主分类号 H04L12/26
代理机构 代理人
主权项
地址