发明名称 IDENTIFICATION OF NEAR DUPLICATED USER-GENERATED CONTENT
摘要 A computer-implemented system and method for identification of near duplicate user-generated content in a networked system are disclosed. The apparatus in an example embodiment includes a data receiver to receive a first instance of user-generated content; a tokenizer to tokenize the first instance into a set of words, create a set of portions from the tokenized first instance, and assign weight to each portion of the set of portions; a magnitude calculator to calculate a magnitude for the first instance based on the weight of each portion; a resemblance score calculator to search a data store for a second instance with at least one portion in common with the first instance and calculate a resemblance score between the first instance and the second instance; and an account linker to link accounts associated with each of the first instance and the second instance.
申请公布号 WO2009126296(A1) 申请公布日期 2009.10.15
申请号 WO2009US02231 申请日期 2009.04.09
申请人 EBAY INC.;SCHUIL, ROBIN, JOHAN 发明人 SCHUIL, ROBIN, JOHAN
分类号 G06F17/30;G06Q30/00 主分类号 G06F17/30
代理机构 代理人
主权项
地址