摘要 |
Discovering a keyword query corresponding to an input collection of documents taken from a candidate pool includes selecting a document from a working set as the input set, and extracting a list of snippets in the selected document. For each snippet, executing a set of proximity queries based on selected terms in that snippet, and finding all possible proximity queries that return less than N query results from the candidate pool. A query is selected from said proximity queries, based on the selected query returning the greatest number of working set documents, and returning the smallest number of documents not in the working set. Documents returned by the selected query are removed from the working set, and the above steps are repeated until no documents remain in the working set. The disjunction of selected queries is returned as the discovered query. |