发明名称 DOMAIN NAME STATISTICAL CLASSIFICATION USING CHARACTER-BASED N-GRAMS
摘要 <p>Systems and methods of classifying domain names are disclosed herein. Character-based n-grams are derived from a domain name in order to classify such domain name in one or more pre-established categories. In one aspect, a geometrical approach is used. Domain name character-based n-grams are mapped to vector points in a multidimensional space. In addition, vector points for various other domain names, which belong to a domain name classification, can be mapped multidimensional space. The number of dimensions in the multidimensional space is the number of different n-grams that can exist for an n-character combination. The relationship between the domain name vector point and the vector points of the various other domain names is used as an indicator of the classification of the domain name vector point. In another aspect, the classification system can be configured to utilize statistical methods. Relative frequencies of one or more character-based n-grams in various classifications are used as indicators. For example, a dictionary set of character-based n-grams can be derived from one or more domain names. The character-based n-grams in the dictionary set can be associated with probability indicative to the likelihood that the character-based n-gram is found in a domain name of a given classification. Such probability can serve as an estimator of a classification of a new domain name having such character-based n-gram.</p>
申请公布号 WO2009023583(A2) 申请公布日期 2009.02.19
申请号 WO2008US72668 申请日期 2008.08.08
申请人 MICROSOFT CORPORATION 发明人 REZNIK, ILIA;SIMONSON, ROGER N.
分类号 G06F17/00;G06F17/30 主分类号 G06F17/00
代理机构 代理人
主权项
地址