发明名称 Mining biological networks to explain and rank hypotheses
摘要 An approach is provided to identify important paths in a biological relationship graph for exploration by researchers. In the approach, a biological meaningfulness analysis is performed on the biological relationship graph that has a number of paths through the graph formed by a number of connected nodes. The biological meaningfulness analysis is based on a process similarity calculation of gene ontologies of the nodes in the paths and a contextual similarity calculation of word occurrences from documents in a corpus where a reference to the respective nodes are found. A biological interestingness analysis is also performed on the biological relationship graph. The paths are screened based on the meaningfulness analysis and the interestingness analysis. The screened data is displayed to the user.
申请公布号 US9536193(B1) 申请公布日期 2017.01.03
申请号 US201514964530 申请日期 2015.12.09
申请人 International Business Machines Corporation 发明人 Labrie Jacques J.;Perera Pathirage D.;Nagarajan Meenakshi;Ramakrishnan Cartic;Spangler William Scott
分类号 G06N5/00 主分类号 G06N5/00
代理机构 VanLeeuwen & VanLeeuwen 代理人 VanLeeuwen & VanLeeuwen ;Gerhardt Diana R.
主权项 1. A method implemented by an information handling system that includes a memory and a processor, the method comprising: performing a biological meaningfulness analysis on a biological relationship graph that has a plurality of paths through the graph, wherein each of the plurality of paths includes a plurality of connected nodes, and wherein the biological meaningfulness analysis is based on a process similarity calculation of gene ontologies of the nodes in the paths and a contextual similarity calculation of word occurrences from a plurality of documents in a corpus where a reference to the respective nodes are found; performing a biological interestingness analysis on the biological relationship graph that is based on a path diversity value calculated for each of the paths and a path rarity value calculated for each of the paths, wherein the path diversity value is based on a number of distinct documents in each of the paths and the number of connections in the respective paths, and wherein the path rarity value is based a total degrees of nodes that form each of the paths; and screening the plurality of paths in the biological relationship graph based on the biological meaningfulness analysis and the biological interestingness analysis, wherein the screened plurality of paths are displayed to a user, and wherein the screening further comprises: identifying one or more meaningful paths through the biological relationship graph based on comparing a path meaningfulness value (PMV) with a threshold; andranking the meaningful paths by a path interestingness value (PIV) corresponding to each of the meaningful paths.
地址 Armonk NY US