发明名称 Detection of spam using contextual analysis of data sources
摘要 Aspects of the disclosure provide for detection of spam business listings. Aspects operate to identify business listing characteristics in trusted sources and untrusted sources. As untrusted sources are likely to contain more spam, characteristics that are present in untrusted sources but not present in trusted sources are typically indicative of spam listings, and vice versa. Thus, statistical analysis of the frequency of characteristics within each source may be used to identify common characteristics of spam listings. These characteristics may further be analyzed in specific listing contexts, as different listing contexts (e.g., different types of businesses) typically use different terms and vocabularies, such that terms that are indicative of spam in one context may not be indicative of spam in another. Various methods for leveraging this context-specific statistical information to improve spam detection operations are disclosed.
申请公布号 US8909591(B1) 申请公布日期 2014.12.09
申请号 US201414254335 申请日期 2014.04.16
申请人 Google Inc. 发明人 Grundman Douglas Richard;Yuksel Baris;Adarsh Anurag;Janawadkar Piyush
分类号 G06N7/00;H04L29/06 主分类号 G06N7/00
代理机构 Lerner, David, Littenberg, Krumholz & Mentlik, LLP 代理人 Lerner, David, Littenberg, Krumholz & Mentlik, LLP
主权项 1. A computer implemented method for identifying business listings, the method comprising: determining, using one or more processors, a first frequency value of a business listing characteristic within a first plurality of business listings received from a first source, the first plurality of business listings being associated with a particular business listing context; determining, using the one or more processors, a second frequency value of the business listing characteristic within a second plurality of business listings received from a second source, the second plurality of business listings being associated with the particular business listing context; determining, using the one or more processors, a frequency differential between the first frequency value and the second frequency value; in response to the frequency differential exceeding a threshold differential, identifying, using the one or more processors, the business listing characteristic as a differential characteristic; and identifying, using the one or more processors, a particular business listing of the plurality of business listings as a spam listing using the differential characteristic.
地址 Mountain View CA US