摘要 |
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for identifying web hosting entities. In one aspect, a system includes one or more computers programmed to perform operations including maintaining an Internet Protocol (IP) address history for each hostname in a plurality of hostnames. Each IP address history is a time series of IP addresses. The operations further include organizing the hostnames into a collection of groups so that each hostname of the plurality of hostnames is a member of exactly one group in the collection of groups. Each group has a kernel calculated from the IP address histories of the members of the group, and the IP address history of each member of the group is within a threshold distance of the kernel of the group. |