发明名称 Training set construction for taxonomic classification
摘要 A training set generator may be configured to input a taxonomy including a hierarchy of categories and a plurality of top-level sites, and to output a training set of categorized data. The training set generator may include a crawler configured to crawl each of the top-level sites to determine at least one lower-level site associated therewith and to store the top-level sites and associated lower-level sites as crawl data. The training set generator also may include an extractor configured to determine, for each of the top-level sites, a corresponding site-specific extraction template associating at least one portion of the corresponding top-level site with at least one category of the hierarchy of categories, and further configured to apply each site-specific extraction template to corresponding crawl data to thereby associate the crawl data with the categories of the hierarchical categories and obtain categorized data of the training set.
申请公布号 US8484194(B1) 申请公布日期 2013.07.09
申请号 US201213350213 申请日期 2012.01.13
申请人 JUANG PHILO;TESTA CHRISTOPHER;MOTE NICOLAUS;GOOGLE INC. 发明人 JUANG PHILO;TESTA CHRISTOPHER;MOTE NICOLAUS
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址