发明名称 Set similarity selection queries at interactive speeds
摘要 The similarity between a query set comprising query set tokens and a database set comprising database set tokens is determined by a similarity score. The database sets belong to a data collection set, which contains all database sets from which information may be retrieved. If the similarity score is greater than or equal to a user-defined threshold, the database set has information relevant to the query set. The similarity score is calculated with an inverse document frequency method (IDF) similarity measure independent of term frequency. The document frequency is based at least in part on the number of database sets in the data collection set and the number of database sets which contain at least one query set token. The length of the query set and the length of the database set are normalized.
申请公布号 US7921100(B2) 申请公布日期 2011.04.05
申请号 US20080006332 申请日期 2008.01.02
申请人 AT&T INTELLECTUAL PROPERTY I, L.P. 发明人 HADJIELEFTHERIOU MARIOS;CHANDEL AMIT;KOUDAS NICK;SRIVASTAVA DIVESH
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址