发明名称 SYSTEMS AND METHODS FOR KEYWORD DETERMINATION AND DOCUMENT CLASSIFICATION FROM UNSTRUCTURED TEXT
摘要 In various embodiments, documents are searched and retrieved via receipt of a search query, electronically identifying a reference set of relevant documents, providing a search set of documents, creating a database comprising at least some of the documents of the search set and the reference set, computationally classifying the documents in the database, extracting keywords from the search set and one or more classified sets, optionally filtering the extracted keywords, and electronically identifying at least some of the documents from the database that contain one or more of the extracted keywords.
申请公布号 US2016224662(A1) 申请公布日期 2016.08.04
申请号 US201414904853 申请日期 2014.07.14
申请人 PRESIDENT AND FELLOWS OF HARVARD COLLEGE 发明人 KING Gary;ROBERTS Margaret;LAM Patrick
分类号 G06F17/30;G06N99/00 主分类号 G06F17/30
代理机构 代理人
主权项 1. A computer-implemented method of document searching and retrieval in a corpus of documents stored in a database, the method comprising: (a) receiving, via a communications interface, a search query (1) comprising at least one reference keyword and (2) pertaining to a pre-determined concept; (b) electronically identifying, in response to the search query, a reference set of query-responsive documents each relevant to the pre-determined concept and containing text matching the at least one reference keyword; (c) providing an electronically stored search set of documents, wherein (1) each of the documents in the search set is not within the reference set and (2) one or more of the documents in the search set are relevant to the pre-determined concept; (d) creating an electronically stored database of documents comprising at least some of the documents from the reference set and at least some of the documents from the search set; (e) computationally classifying, by a computer processor and without utilizing the at least one reference keyword, documents in the database into a first classified set documents in the first classified set being predicted to be relevant to the pre-determined concept and comprising documents from the reference set and the search set; (f) computationally extracting, by the computer processor, from documents in both the search set and the first classified set, one or more keywords each (1) predicted to be relevant to the pre-determined concept and (2) different from the at least one reference keyword; (g) optionally, filtering the extracted keywords; and (h) electronically identifying at least some of the documents from the database (1) in response to the search query, (2) via a communications interface, (3) in addition to the electronically identified reference set of documents, and (4) that contain one or more of the extracted keywords.
地址 Cambridge MA US