发明名称 |
Fast personalized page rank on map reduce |
摘要 |
A personalized page rank computation system is described herein that provides a fast MapReduce method for Monte Carlo approximation of personalized PageRank vectors of all the nodes in a graph. The method presented is both faster and less computationally intensive than existing methods, allowing a broader scope of problems to be solved by existing computing hardware. The system adopts the Monte Carlo approach and provides a method to compute single random walks of a given length for all nodes in a graph that it is superior in terms of the number of map-reduce iterations among a broad class of methods. The resulting solution reduces the I/O cost and outperforms the state-of-the-art FPPR approximation methods, in terms of efficiency and approximation error. Thus, the system can very efficiently perform single random walks of a given length starting at each node in the graph and can very efficiently approximate all the personalized PageRank vectors. |
申请公布号 |
US8856047(B2) |
申请公布日期 |
2014.10.07 |
申请号 |
US201113164788 |
申请日期 |
2011.06.21 |
申请人 |
Microsoft Corporation |
发明人 |
Chakrabarti Kaushik;Xin Dong;Bahmani Bahman |
分类号 |
G06F17/10;G06F17/16;G06F17/17;G06F17/30 |
主分类号 |
G06F17/10 |
代理机构 |
|
代理人 |
Choi Dan;Swain Sandy;Minhas Micky |
主权项 |
1. A computer-implemented method to generate random walks from source nodes in a graph, the method comprising:
receiving an input graph; receiving an input walk length that specifies a number of nodes to be visited from a source node to produce a random walk from each source node in an input graph; receiving an input segment length that determines how long each segment generated by the method will be; determining a number of segments of the received input segment length to be generated out of each node of the input graph based on the received input walk length; generating the determined number of segments of the received segment length; and merging the generated segments iteratively, merging multiple segments per node in each iteration, until a random walk of the received input walk length is obtained from each node, wherein at least a portion of the preceding steps are performed by at least one processor. |
地址 |
Redmond WA US |