Near-duplicate document detection for web crawling,申请号US201213422130-传众专利搜索

首页产品黄页商标征信

会员服务注册登录

法人/股东/高管

发明名称	Near-duplicate document detection for web crawling
摘要	A system generates a hash value for a fetched document and compares the hash value with a set of stored hash values to identify ones of the stored hash values with a sequence of bit positions, less than all of the bit positions, that match a corresponding sequence of bit positions of the hash value. The system also determines whether any of the identified hash values are substantially similar to the hash value and identify the fetched document as a near-duplicate of another document when one of the identified hash values is substantially similar to the hash value.
申请公布号	US8548972(B1)	申请公布日期	2013.10.01
申请号	US201213422130	申请日期	2012.03.16
申请人	JAIN ARVIND;MANKU GURMEET SINGH;GOOGLE INC.	发明人	JAIN ARVIND;MANKU GURMEET SINGH
分类号	G06F17/30	主分类号	G06F17/30
代理机构		代理人
主权项
地址

您可能感兴趣的专利

用于制备包含金属-碳本体的方法

抗蚀剂剥离组合物和生产电气装置的方法

Application of hybrid systems MgO-SiO2 / multiwalled olygomeric silsesquioxanes as promoters of ceramization in silicone composites for covers of electrical cables

环保水擦黑板板擦清洁机

用于无线充电的自适应电力控制

电池用氟化锂钒聚阴离子粉末的制备方法

照射系统和光刻设备

Electrical initiator having two operating modes

SYSTEM AND METHOD FOR CAPTURING AND SHARING CONSOLE GAMING DATA

聚焦环组合、IMP溅射设备

牙科修复系统及其方法

扫描电子显微镜的检测方法

一种头孢地嗪酸侧链中间体的合成方法

一种Oligo dT引物及构建cDNA文库的方法

NOVEL REAL TIME PHYSICAL REALITY IMMERSIVE EXPERIENCES HAVING GAMIFICATION OF ACTIONS TAKEN IN PHYSICAL REALITY

光电元件、显示单元及其制造方法、显示面板

液晶显示装置及其驱动方法

用于监视和评估设备运行参数修改的系统和方法

改进的从C<sub>8</sub>芳族化合物中获取间二甲苯的产率的工艺方法

确定给用户配置的模式的方法和网络设备