发明名称 Fixed phrase detection for search
摘要 A set of search requests may be analyzed to detect fixed phrases suitable for inclusion in a search index. Sets of candidate phrases may be identified among the search requests. Fixed phrases may be detected among the candidate phrases using statistical techniques, for example, by identifying phrases having a relatively high pointwise mutual information (PMI) with respect to component keywords. Fixed phrase detection may include keyword and/or phrase clustering. Clusters may correspond to topics defined using a latent Dirichlet allocation (LDA) procedure. Fixed phrase detection may include identifying phrases having relatively high PMI within particular clusters.
申请公布号 US8751518(B1) 申请公布日期 2014.06.10
申请号 US201213465884 申请日期 2012.05.07
申请人 A9.com, Inc. 发明人 Ahmad Waseem;Jain Deepak
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项 1. A computing system for facilitating a search, comprising: at least one processor; and memory including instructions that, when executed by the at least one processor, cause the computing system to: cluster search terms extracted from previous searches into a plurality of search term clusters independent of human supervision;for each of the plurality of search term clusters, identify a candidate search phrase comprising a first search term and a second search term;determine a first count of the first search term, a second count of the second search term, and a mutual count of the first search term and the second search term appearing simultaneously in the previous searches;weight each individual count of the first count, the second count, and the mutual count based at least in part on an age of a respective individual count with respect to the previous searches, wherein each age is a difference in time between a current time and when the respective individual count appeared in the previous searches;determine a pointwise mutual information score for the candidate search phrase using the weighted first count of the first search term, the weighted second count of the second search term, and the weighted mutual count of the first search term and the second search term;select the candidate search phrase as a fixed phrase for inclusion in a respective search term cluster based at least in part on the determined pointwise mutual score being greater than a threshold score;in response to receiving a search request, determine a relevance score for each of at least a portion of the search terms, including the fixed phrase, with respect to a collection of content;provide at least one search result for presentation, the at least one search result at least referencing content selected from the collection of content based at least in part on the at least one relevance score;detect interaction with a search result of the provided at least one result from a user; andupdate the clustered search terms associated with the search result.
地址 Palo Alto CA US