发明名称 Systems, methods, and apparatus for processing documents to identify structures
摘要 In various embodiments, multiple heterogeneous documents are processed to identify structures, such as chemical structures, contained therein, including non-embedded structures. Also described is a graphical user interface that permits a user to search for a structure or substructure within a set of electronic documents, then displays the matching structures as well as the actual pages of the documents on which the matching structures are found. Display of the actual pages allows the user to verify the matches and provides helpful context for the user.
申请公布号 US9031977(B2) 申请公布日期 2015.05.12
申请号 US201313855342 申请日期 2013.04.02
申请人 Perkinelmer Informatics, Inc. 发明人 Smith Robin Y.;Ballard William B.;Flicker Scott G.;Greenhow Sean G.
分类号 G06F17/30;G06F19/00;G06K9/00 主分类号 G06F17/30
代理机构 Choate, Hall & Stewart LLP 代理人 Choate, Hall & Stewart LLP
主权项 1. A system for automatically identifying chemical structures found in one or more electronic files, the system comprising: a memory having a set of instructions stored thereon; and a processor, wherein the instructions, when executed by the processor, cause the processor to: (a) identify one or more candidate chemical structures in an electronic file, wherein the electronic file comprises at least one non-embedded image of a chemical structure, andidentifying each candidate chemical structure of the one or more candidate chemical structures comprises identifying one or more graphical features common to chemical structures;(b) for each candidate chemical structure of the one or more candidate chemical structures, derive a respective chemical structure object with an associated set of properties, wherein one or more properties of the set of properties is derived from at least a portion of the one or more graphical features common to chemical structures,a first property of the set of properties is a number of carbons, wherein the number of carbons is derived from the one or more graphical features common to chemical structures, anda second property of the set of properties comprises one of the following: (A) number of hetero atoms, (B) number of bonds, (C) number of bonds of a selected bond order, (D) number of rings, and (E) formula weight;(c) for each chemical structure object, apply one or more filters to at least one property of the associated set of properties, wherein the one or more filters includes a filter configured to eliminate chemical structure objects having a value of the first property of the set of properties less than a predetermined number of carbons; and(d) provide, for storage in a searchable electronic compendium of identified chemical structure objects, chemical structure objects not eliminated by the one or more filters.
地址 Waltham MA US
您可能感兴趣的专利