发明名称 Determining word information entropies
摘要 Determining and using word information entropies includes: determining one or more categories that correspond to a plurality of queries; sorting the plurality of queries into one or more groups based at least in part on the determined categories of the plurality of queries; segmenting queries that correspond to each of the one or more groups into a first plurality of phrases, wherein each phrase includes one or more words; determining occurrence probabilities for the plurality of phrases; and determining word information entropies for the plurality of phrases based at least in part on the determined occurrence probabilities.
申请公布号 US9342627(B2) 申请公布日期 2016.05.17
申请号 US201314024431 申请日期 2013.09.11
申请人 Alibaba Group Holding Limited 发明人 Jin Kaimin
分类号 G06F17/30 主分类号 G06F17/30
代理机构 Van Pelt, Yi & James LLP 代理人 Van Pelt, Yi & James LLP
主权项 1. A system, comprising: one or more processors configured to: determine one or more categories that correspond to a plurality of queries;sort the plurality of queries into one or more groups based at least in part on the determined one or more categories of the plurality of queries;segment queries that correspond to each of the one or more groups into a first plurality of phrases, wherein each phrase includes one or more words;determine occurrence probabilities for the first plurality of phrases, the determined occurrence probabilities being computed based at least in part on a number of times a phrase occurs in a corresponding group and a number of times the phrase occurs across the one or more groups;determine word information entropies for the first plurality of phrases based at least in part on the determined occurrence probabilities, wherein a word information entropy relates to a degree of uncertainty for a corresponding phrase used in searching;perform a first search using a subsequent query, wherein the subsequent query includes a second plurality of phrases:determine that one or more search results found for the subsequent query do not meet a predetermined rule associated with search results being close matches to the subsequent query; andin response to the determination that the one or more search results returned for the subsequent query do not meet the predetermined rule associated with search results being close matches to the subsequent query: determine a first phrase of the second plurality of phrases of the subsequent query that is associated with a corresponding word information entropy that is less than a threshold value;determine a second phrase of the second plurality of phrases of the subsequent query that is associated with a second corresponding word information entropy that is equal to or greater than the threshold value;generate a new query that includes the first phrase and excludes the second phrase; andperform a second search using the new query; and one or more memories coupled to the one or more processors and configured to provide the one or more processors with instructions.
地址 KY