主权项 |
1. A data management device, which is communicatively connected to a plurality of join processing devices, comprising:
a data storage unit that stores a plurality of tuples, each of the tuples comprising a join key string; and a data distributing unit that determines, for each of the tuples stored in the data storage unit, one of the join processing devices as a transmission destination, the tuple being transmitted to the transmission destination determined for the tuple, the join processing device performing a similarity join process to the tuples transmitted thereto using an edit distance threshold value τ (positive integer), wherein the data distributing unit performs the determination so that the tuples being transmitted to a same join processing device have at least one same character as each other in a head portion or a tail portion of the join key strings of the tuples, the head portion of the join key string ranging from a head character of the join key string to a (τ+1)th character of the join key string, the tail portion of the join key string ranging from a (N−τ)th character of the join key string to a Nth character of the join key string wherein N is a length of the join key string. |