发明名称 Classification of documents
摘要 Some embodiments provide a method for evaluating a content segment for relevancy to several of categories. The method retrieves the content segment. For each of the several categories, the method determines the relevancy of the content segment to the category by using a scoring model for the category. The scoring model accounts for (i) the presence of key word sets in the content segment and (ii) the context of the key word sets in the content segment. For each of the several categories, the method tags the content segment when the content segment is determined as relevant to the category.
申请公布号 US8805840(B1) 申请公布日期 2014.08.12
申请号 US201012772166 申请日期 2010.04.30
申请人 Firstrain, Inc. 发明人 Joshi Ashutosh;Betz Martin;Arora Rajiv;Srivastava Rakesh Kumar;Cooke David
分类号 G06F7/00;G06F17/30 主分类号 G06F7/00
代理机构 Adeli LLP 代理人 Adeli LLP
主权项 1. A method for evaluating a content segment for relevancy to a plurality of categories, the method comprising: retrieving the content segment; for each of a plurality of categories, calculating a score for the content segment to determine the relevancy of the content segment to the category by using a context-based model for the category, said context-based model comprising (i) a set of groups of word sets, each group of word sets comprising a key word set and a second word set, (ii) scores for the groups of word sets, and (iii) a definition of context that specifies when a second word set is in a context of a key word set, wherein the context for a particular key word set is based on a specified relationship, within the content segment, between the particular key word set and a second word set, wherein the calculating comprises: identifying each key word set from the context-based model that is in the content segment;for each identified key word set, associating the key word set and each word set within the context of the key word set in the content segment as a different group of word sets; andaggregating scores for each of the associated groups of word sets to calculate the score for the content segment; and for each of the plurality of categories, tagging the content segment with a category tag to signify that the content segment is relevant to the category when the content segment is determined to be relevant to the category.
地址 San Mateo CA US