发明名称 System and method for measuring the quality of document sets
摘要 Systems and methods are described that calculate the interestingness of a set of one or more records in a database, either absolutely (i.e., compared to an overall collection of records) or relative to some other set of records. In one embodiment, the measure is a relative entropy value that has been normalized. Various applications of the measure are described in the context of an information retrieval system. These applications include, for example, guiding query interpretation, guiding view selection and summarization, intelligent ranges, event detection, concept triggers and interpreting user actions, hierarchy discovery, and adaptive data mining.
申请公布号 US8874549(B2) 申请公布日期 2014.10.28
申请号 US200812146267 申请日期 2008.06.25
申请人 Oracle OTC Subsidiary LLC 发明人 Tunkelang Daniel;Wang Joyce Jeanpin;Zelevinsky Vladimir
分类号 G06F17/30 主分类号 G06F17/30
代理机构 Miles & Stockbridge P.C. 代理人 Miles & Stockbridge P.C.
主权项 1. A computer implemented method for presenting a view of a result obtained from interaction with a collection of information, the method comprising: accessing, by a computer system, at least one result set in response to interaction with a collection of information; determining, by the computer system, at least one identifying characteristic within the at least one result set returned from the interaction with a collection of information; determining, by the computer system, a statistical distribution of the at least one identifying characteristic within the at least one result set; generating, by the computer system, a measurement of distinctiveness for the at least one result set based, at least in part, on the statistical distribution of the at least one identifying characteristic within the at least one result set, wherein the distinctiveness of the at least one result set is measured in relation to the collection of information, and wherein the generating comprises determining the measurement of distinctiveness from a statistical distribution of at least one identifying characteristic in the at least one result set against a baseline statistical distribution, and wherein the baseline statistical distribution is determined at a time of the interaction or thereafter; modifying, by the computer system, the at least one result set based at least in part on the measurement of distinctiveness for the at least one result set; and returning the modified result set; wherein the modifying the at least one result set comprises determining a contribution of the at least one identifying characteristic to the measurement of distinctiveness and highlighting the at least one identifying characteristic within the at least one result set based on the determined contribution.
地址 Redwood Shores CA US