发明名称 Predictive and descriptive analysis on relations graphs with heterogeneous entities
摘要 A system, method and computer program product provides a random walk model with heterogeneous graphs to leverage multiple source data and accomplish prediction tasks. The system and method components include: 1) A heterogeneous graph formulation including heterogeneous instances of abstract objects as graph nodes and multiple relations as edges connecting those nodes. The different types of relations, such as client-vendor relation and client-product relation, are often quantified as the weights of edges connecting those entities; 2) To accomplish prediction tasks with such information, launching a multi-stage random walk model over the heterogeneous graph. The random walk within a subgraph with homogenous nodes usually produces the relevance between entities of the same type. The random walk across different type of nodes provides the prediction of decisions, such as a client purchasing a product.
申请公布号 US9195941(B2) 申请公布日期 2015.11.24
申请号 US201313868644 申请日期 2013.04.23
申请人 International Business Machines Corporation 发明人 Mojsilovic Aleksandra;Varshney Kush R.;Wang Jun
分类号 G06N5/04;G06N5/02;G06Q10/00;G06N7/00 主分类号 G06N5/04
代理机构 Scully, Scott, Murphy & Presser, P.C. 代理人 Scully, Scott, Murphy & Presser, P.C. ;Morris, Esq. Daniel P.
主权项 1. A system for predicting a relation between entities comprising: a memory storage device; a computer system associated with said memory storage device, said computer system including a processor device configured to perform a method to: construct, at a computer device, a heterogeneous graph representation of multi-source data including: receiving data for forming multiple unipartite sub-graphs, each sub-graph having homogeneous vertices and edges connecting said vertices, andreceiving data for forming bipartite sub-graphs having partially observed edges connecting respective nodes between any two different unipartite sub-graphs, said partially observed edges representing cross-entity links; compute at said computer device, a steady-state relevance matrix for each sub-graph using a homogeneous Markov Random Walk model applied to each said unipartite sub-graph, wherein to apply said homogeneous Markov Random Walk model, said processor device is further configured to compute a transition probability matrix for each said unipartite sub-graph, said computing said steady-state relevance matrix associated with a respective unipartite sub-graph based on a respective computed transition probability matrix; and said processor device is further configured to compute state probability matrices at transition times t and t+1, where a computed state probability matrix at time t+1 is a function of a computed state probability matrix at time t, said computed transition probability matrix, a computed matrix representing a uniform prior state for all the vertices of said a unipartite graph, and for a modeled random walker time transition between vertices in said sub-graph, a restart probability value for said unipartite sub-graph; dynamically generate missing edges connecting vertices between each of two unipartite sub-graphs by applying, using said computed steady-state relevance matrix for each sub-graph, an iterative and heterogeneous Markov Random Walk model to said bipartite sub-graphs to dynamically generate missing edges, wherein a generated missing edge represents a cross-entity connection recommendation or prediction in said heterogeneous graph, and wherein a first unipartite sub-graph type of said multiple unipartite sub-graphs comprises vertices representing a plurality of different users, a second unipartite sub-graph type of said multiple unipartite sub-graph comprises vertices representing a plurality of items, and a third unipartite sub-graph type of multiple unipartite sub-graph comprises vertices representing a plurality of different entity types; wherein edges connecting a sub-set of different vertices in said first type of unipartite sub-graph have different edge weights according to observed user affinity measures between different users, wherein edges connecting a sub-set of different vertices in said second type of unipartite graph have different edge weights according to observed item similarity measures between different items; and edges connecting a sub-set of different vertices of said third type unipartite graph have different edge weights according to observed measures describing relationships between different entities; wherein a computed steady-state relevance matrix associated with said first type of unipartite sub-graph is a function of a transition probability computed for the users, and a computed normalized matrix representing an affinity of the user to other users as represented in said first type unipartite graph, a computed steady-state relevance matrix associated with said second type of unipartite sub-graph is a function of a transition probability matrix for the items, and a computed normalized matrix representing a similarity of an item to other items as represented in said second type unipartite graph, and a restart probability value for said second type sub-graph; and, a computed steady-state relevance matrix associated with said third entity type unipartite sub-graph is a function of a transition probability matrix for the entities, a computed normalized matrix representing a similarity of an entity to other entities in said entity type of unipartite graph, and a restart probability in said entity sub-graph.
地址 Armonk NY US
您可能感兴趣的专利