发明名称 System and method for determining concepts in a content item using context
摘要 The present invention is directed towards systems and methods for indexing one or more items of content. The method of the present invention comprises extracting one or more items of text from a given item of content. The one or more items of extracted text are tokenized into one or more concepts. One or more related concepts associated with the one or more concepts are identified. A support score is generated for the one or more concepts, and the item of content is index with the one or more concepts and the one or more associated support scores.
申请公布号 US8856145(B2) 申请公布日期 2014.10.07
申请号 US200611639849 申请日期 2006.12.15
申请人 Yahoo! Inc. 发明人 Parikh Jignashu;Thrall John
分类号 G06F17/30 主分类号 G06F17/30
代理机构 Pillsbury Winthrop Shaw Pittman LLP 代理人 Pillsbury Winthrop Shaw Pittman LLP
主权项 1. A method implemented on at least one machine having at least one processor, storage, and a communication platform connected to a network for indexing one or more items of content, the method comprising: extracting, by the at least one processor, one or more items of text from a given item of content; tokenizing, by the at least one processor, the one or more extracted items of text into one or more concepts based on past queries submitted by one or more users; identifying one or more related concepts associated with the one or more concepts; obtaining, by the at least one processor, a support score for the individual one or more concepts based on whether one or more of the one or more concepts appear in the given item of content and/or whether one or more of the one or more related concepts appear in the given item of content; and generating an index, the index comprising the given item of content associated with the one or more concepts and corresponding support scores for the individual one or more concepts; receiving a search query; identifying, based on the index, a set of items of content responsive to the search query, wherein individual items of content in the set are indexed with one or more concepts that are related to the search query; obtaining, for each individual item of content in the set, a sum of support scores associated with the one or more concepts that are related to the search query; and providing the set, wherein the items of content in the set are sorted based on the sum of support scores.
地址 Sunnyvale CA US