发明名称 Systems, methods, and computer program products for fast and scalable proximal search for search queries
摘要 Embodiments of the invention provide a method and computer program products for information retrieval from multiple documents by proximity searching for search queries. A method includes generating an index for the multiple documents, wherein the index includes words in snippets in the documents. An input search query is processed against the index by searching query terms over the snippets to introduce term proximity information implicitly in the information retrieval. Results of multiple sentence level search operations are combined as output.
申请公布号 US8745062(B2) 申请公布日期 2014.06.03
申请号 US201213587413 申请日期 2012.08.16
申请人 International Business Machines Corporation 发明人 Bhatia Sumit;He Bin;He Qi;Spangler William S.
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项 1. A method of information retrieval from multiple documents, comprising: splitting each document into multiple snippets of words; generating a separate index for each snippet; receiving an input search query including at least one sentence; and processing the search query against each separate index of each snippet of the multiple snippets by searching query terms over each of the multiple snippets to implicitly introduce term proximity information in the information retrieval, wherein processing the search query further comprises: creating an OR-Query of all non-stopwords in each sentence;returning a fit value for each OR-Query, wherein a fit value represents a similarity metric that measures the amount of word content overlap between two text units; andaggregating the fit values to provide a score for every document returned by the OR-Queries.
地址 Armonk NY US