发明名称 Join processing device, data management device, and string similarity join system
摘要 Provided is a join processing device that performs a similarity join process to plural tuples using an edit distance threshold value τ (positive integer). The join processing device includes a join processing unit that excludes, from a target of edit distance calculation, a pair of tuples that do not have any common character in an end portion ranging from a head character or a tail character to a (τ+1)th character in a join key string in each of the tuples.
申请公布号 US9535954(B2) 申请公布日期 2017.01.03
申请号 US201113983373 申请日期 2011.11.07
申请人 NEC CORPORATION 发明人 Narita Kazuyo
分类号 G06F7/00;G06F17/30 主分类号 G06F7/00
代理机构 Sughrue Mion, PLLC 代理人 Sughrue Mion, PLLC
主权项 1. A data management device, which is communicatively connected to a plurality of join processing devices, comprising: a data storage unit that stores a plurality of tuples, each of the tuples comprising a join key string; and a data distributing unit that determines, for each of the tuples stored in the data storage unit, one of the join processing devices as a transmission destination, the tuple being transmitted to the transmission destination determined for the tuple, the join processing device performing a similarity join process to the tuples transmitted thereto using an edit distance threshold value τ (positive integer), wherein the data distributing unit performs the determination so that the tuples being transmitted to a same join processing device have at least one same character as each other in a head portion or a tail portion of the join key strings of the tuples, the head portion of the join key string ranging from a head character of the join key string to a (τ+1)th character of the join key string, the tail portion of the join key string ranging from a (N−τ)th character of the join key string to a Nth character of the join key string wherein N is a length of the join key string.
地址 Tokyo JP