发明名称 IDENTIFYING SIMILAR DOCUMENTS USING GRAPHS
摘要 While a document, such as an e-book, is read by a user on a computing device such as an e-reader, concept phrases are extracted from the document. The extracted concept phrases may be words or phrases that match known concept phrases such as headings. Based on a universal concept phrase graph that includes nodes for each known concept phrase, core concept phrases are determined for the document. These core concept phrases are associated with nodes of the universal concept phrase graph that are located within a predetermined distance of nodes that represent the concept phrases extracted from the document. Each core concept phrase is combined with one or more of the concept phrases to generate multiple queries. These queries are submitted to search engines, and indicators of documents from the corresponding search results are presented to the user with the original document that is being read.
申请公布号 US2016224547(A1) 申请公布日期 2016.08.04
申请号 US201514610261 申请日期 2015.01.30
申请人 Microsoft Technology Licensing, LLC 发明人 Agrawal Rakesh;Gollapudi Sreenivas;Kannan Anitha;Kenthapadi Krishnaram;Parrish Nathaniel Dion
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项 1. A method comprising: receiving a document by a computing device; determining a plurality of concept phrases associated with the document by the computing device; generating a concept phrase graph of the received document based on the determined plurality of concept phrases by the computing device; and identifying one or more documents of a plurality of documents that are similar to the received document based on the concept phrase graph of the received document, and concept phrase graphs associated with each of the documents of the plurality of documents by the computing device.
地址 Redmond WA US