发明名称 Discovery engine
摘要 A method that is relatively inexpensive to implement and that permits a user to conduct searches of electronically stored documents using an entire document, multiple documents or portions of a document as the search criteria and to collect, store and to share the relevant documents from the search.
申请公布号 US9507867(B2) 申请公布日期 2016.11.29
申请号 US201414199985 申请日期 2014.03.06
申请人 Enlyton, Inc. 发明人 Johns Mark Ellingham;McKinzie Chris
分类号 G06F17/30 主分类号 G06F17/30
代理机构 The Law Firm of H. Dale Langley, Jr., P.C. 代理人 The Law Firm of H. Dale Langley, Jr., P.C.
主权项 1. A system for semantically searching a group of documents containing words, exclusive of stop words of the documents, thereby improving efficiency by flatly looking at the words being searched without attempting to understand the meaning of the words, comprising: a memory containing a set of instructions; and a processor for processing the set of instructions, wherein the instructions cause the processor to perform a method comprising: receiving by the processor a current instance of a search criteria containing words; determining by the processor a first total number of the words, exclusive of stop words, in the current instance of the search criteria; storing in the memory by the processor the first total number; for each of the words, exclusive of stop words, respectively, in the current instance of the search criteria, determining by the processor a respective first number of times that the word appears in the current instance of the search criteria; storing in the memory by the processor the respective first number of times; for each of the words, exclusive of stop words, respectively, in the current instance of the search criteria, calculating by the processor a first uniqueness score, respectively, for the word, respectively, based on the respective first number and the first total number; storing in the memory by the processor the first uniqueness score, respectively, for the word, respectively; for each of the words, exclusive of stop words, respectively, of the current instance of the search criteria and the documents, determining by the processor a respective second number of times that the word appears in the current instance of the search criteria and the documents; storing in the memory by the processor the respective second number of times, as a first frequency score, respectively; for each of the words, exclusive of stop words, of the current instance of the search criteria and the each of the documents, respectively, calculating by the processor a respective first significance magnitude factor based on the first frequency score, respectively, and the first uniqueness score, respectively; storing in the memory by the processor the respective first significance magnitude factor; determining by the processor a second total number of the words, exclusive of stop words, in the documents of the group; storing in the memory by the processor the second total number; for each of the words, exclusive of stop words, respectively, of the documents, respectively, determining by the processor a respective third number of times that the word appears in the documents of the group; storing in the memory by the processor the respective third number of times; for each of the words, exclusive of stop words, respectively, of the documents, calculating by the processor a second uniqueness score, respectively, for the word, respectively, based on the respective third number and the second total number; storing in the memory by the processor the second uniqueness score, respectively, for the word, respectively; for each of the words, exclusive of stop words of the documents, respectively, in each of the documents, respectively, determining by the processor a respective fourth number of times that the word appears in the document; storing in the memory by the processor the respective fourth number, as a second frequency score, respectively; for each of the words, exclusive of stop words, of the documents, calculating by the processor a respective second significance magnitude factor based on the second frequency score, respectively, and the second uniqueness score, respectively; storing in the memory by the processor the respective second significance magnitude factor; and for each document of the group, generating by the processor a respective similarity score of contents of the document to the current instance of the search criteria, wherein generating the respective similarity score includes characterizing each document based on the respective second significance magnitude factor compared to the respective first significance magnitude factor.
地址 Austin TX US