发明名称 Analytic systems, methods, and computer-readable media for structured, semi-structured, and unstructured documents
摘要 A computer system extracts contender values as positively associated with a pre-defined value from a compilation of one or more electronically stored semi-structured document(s) and/or one or more electronically stored unstructured document(s). The computer system performs a multi-dimensional analysis to narrow the universe of contender values from all words on a page of the compilation to the contender value(s) with the highest likelihood of being associated with the pre-defined value. The system's platform allows every user of the system to customize the system according to the user's needs. Various aspects can enable users to mine document stores for information that can be charted, graphed, studied, and compared to help make better decisions.
申请公布号 US9384264(B1) 申请公布日期 2016.07.05
申请号 US201514960871 申请日期 2015.12.07
申请人 Ephesoft Inc. 发明人 Kavas Ilker
分类号 G06F7/04;G06F17/30;G06K9/00 主分类号 G06F7/04
代理机构 Knobbe Martens Olson & Bear LLP 代理人 Knobbe Martens Olson & Bear LLP
主权项 1. A computer system to extract contender values as positively associated with a pre-defined value from a compilation of one or more electronically stored documents, the system comprising: one or more computer readable storage devices configured to store one or more software modules including computer executable instructions, andthe compilation, wherein the electronically stored documents comprise one or more semi-structured document(s), one or more unstructured document(s), or a combination thereof, and each of the one or more electronically stored documents comprises one or more pages;a network configured to distribute information to a user workstation;one or more hardware computer processors in communication with the one or more computer readable storage devices and configured to execute the one or more software modules in order to cause the computer system to access, from the one or more computer readable storage devices, the compilation;receive information regarding the pre-defined value, wherein the pre-defined value has a certain format, has a certain two-dimensional spatial relationship to words in a pre-selected page, and is associated with one or more keywords;for each page of the compilation, identify words and contender values on the page using optical character recognition (OCR) and post-OCR processing, andgroup the identified words and the identified contender values into anchor blocks based on their spatial positioning on the page, such that the page comprises a plurality of anchor blocks and each anchor block comprises one or more words, one contender value, or a combination thereof;on the page, for each of the contender values, numerically determine a first confidence that the contender value is associated with the pre-defined value based at least in part on a comparison of a calculated two-dimensional spatial relationship between the contender value and the anchor blocks on the page with the pre-defined two-dimensional spatial relationship between the pre-defined value and the words in the pre-selected page,numerically determine a second confidence that the contender value is associated with the pre-defined value based at least in part on a comparison of words in the anchor blocks on the page with the one or more keywords associated with the pre-defined value, andnumerically determine a third confidence that the contender value is associated with the pre-defined value based at least in part on a comparison of a format of the contender value with the certain format of the pre-defined value;over all the pages of the compilation, extract positive contender values as positively associated with the pre-defined value based at least in part on the first confidence, the second confidence, and the third confidence;store the positive contender values in the one or more computer readable storage devices; andtransmit the positive contender values over the network to the user workstation in response to a search for values associated with the pre-defined value at the user workstation.
地址 Laguna Hills CA US