发明名称 Constructing custom knowledgebases and sequence datasets with publications
摘要 Illustrative embodiments of custom knowledgebases and sequence datasets, as well as related methods, are disclosed. In one illustrative embodiment, one or more computer-readable media may comprise a custom knowledgebase and an associated sequence dataset. The custom knowledgebase may comprise a plurality of assertions that have been automatically extracted from a plurality of publications, where each of the plurality of assertions encodes a relationship between a subject and an object. The sequence dataset may comprise a plurality of called biological sequences, where each of the plurality of called biological sequences is associated with one or more of the plurality of assertions of the custom knowledgebase.
申请公布号 US9563741(B2) 申请公布日期 2017.02.07
申请号 US201414280285 申请日期 2014.05.16
申请人 BATTELLE MEMORIAL INSTITUTE 发明人 Godbold William Eugene Dunbar;Yang Boyu
分类号 G06F15/18;G06F19/28;G06N5/02;G06F19/24;G06F19/22 主分类号 G06F15/18
代理机构 Barnes & Thornburg LLP 代理人 Barnes & Thornburg LLP
主权项 1. A method comprising: automatically extracting a plurality of assertions from a plurality of publications, wherein each of the plurality of assertions encodes a relationship between a subject and an object; manually editing the plurality of assertions automatically extracted from the plurality of publications to construct a custom knowledgebase for a particular biological field; and constructing a sequence dataset comprising a plurality of called biological sequences, wherein each of the plurality of called biological sequences is associated with one or more of the plurality of assertions of the custom knowledgebase, wherein constructing the sequence dataset comprises: automatically extracting one or more called biological sequences from the plurality of publications;extracting additional called biological sequences from one or more publicly available databases;grouping the additional called biological sequences with the one or more called biological sequences automatically extracted from the plurality of publications in response to one or more predetermined resemblance criteria being met; andassociating each group of called biological sequences with one or more of the plurality of assertions of the custom knowledgebase.
地址 Columbus OH US