发明名称 System and method for website categorization
摘要 Systems and methods for the categorization of websites are presented. A website is categorized using one or a combination of its domain name and its web page content. The domain name is tokenized, and the tokens compared to categories in a category structure to determine probabilities that the token belongs to each category. Combinations of tokens are similarly compared to the categories. A category may be determined with reference to a vector space in which a training set of websites having known categories is converted according to a methodology into reference vectors containing keyword frequencies. A target website is converted to a target vector using the same methodology, and a distance score of the target vector to each reference vector is calculated. The website represented by the target vector is assigned the category of the reference vector having the lowest distance score.
申请公布号 US9311423(B1) 申请公布日期 2016.04.12
申请号 US201414180249 申请日期 2014.02.13
申请人 Go Daddy Operating Company, LLC 发明人 Brown Robert;Kamdar Tapan;Kirkish Ryan;Lai Wei-Cheng;McLellan Jeff
分类号 G06F17/30 主分类号 G06F17/30
代理机构 Quarles & Brady LLP 代理人 Quarles & Brady LLP
主权项 1. A method, comprising: receiving, by at least one server communicatively coupled to a network, one or more tokens together forming all or part of a string comprising a domain name; comparing, by the at least one server, each of the one or more tokens to each of a plurality of categories in a category structure to determine, for each pairing of one of the tokens with one of the categories, a token probability that the token belongs to the category; for one or more of the token probabilities, increasing or reducing the token probability according to a frequency at which the category associated with the token probability is selected as a correct category or declined as an incorrect category for the token associated with the token probability, the frequency identified from a plurality of domain name searches previously processed by a first of the at least one server; calculating, by the at least one server from the token probabilities, a final probability of the string belonging to each category; and categorizing, by the at least one server, the token in the category having the highest final probability.
地址 Scottsdale AZ US