摘要 |
<p>Described spam detection techniques including string identification, pre-filtering, and character histogram and timestamp comparison steps facilitate accurate, computationally-efficient detection of rapidly-changing spam arriving in short-lasting waves. In some embodiments, a computer system extracts a target character string from an electronic communication such as a blog comment, transmits it to an anti-spam server, and receives an indicator of whether the respective electronic communication is spam or non-spam from the anti-spam server. The anti-spam server determines whether the electronic communication is spam or non-spam according to certain features of the character histogram of the target string. Some embodiments also perform an unsupervised clustering of incoming target strings into clusters, wherein all members of a cluster have similar character histograms.</p> |