发明名称 |
System and Method for Automated Classification of Web pages and Domains |
摘要 |
Representative sample pages from websites accessible to Internet users are manually selected and classified into pre-defined categories based on page content to create a training set as an input to a classifier. An automated analysis is performed to identify a list of catchwords comprising the most frequently referenced words, tags, and/or links from the classified samples in each category in the training set. A data mining tool generates unique sets of distinctive catchwords and/or distinctive combinations of catchwords that have a high probability of appearing only in a single one of the pre-defined content categories. The classifier utilizes the sets of distinctive catchwords/combinations to classify new pages into one or more of the pre-defined content categories.
|
申请公布号 |
US2013066814(A1) |
申请公布日期 |
2013.03.14 |
申请号 |
US201113230562 |
申请日期 |
2011.09.12 |
申请人 |
BOSCH VOLKER;LEMAITRE YVES MARIE |
发明人 |
BOSCH VOLKER;LEMAITRE YVES MARIE |
分类号 |
G06F15/18;G06F17/30 |
主分类号 |
G06F15/18 |
代理机构 |
|
代理人 |
|
主权项 |
|
地址 |
|