发明名称 Categorizing documents
摘要 Categorizing documents is disclosed. A hierarchy of topics is received. A seed for each topic is determined. One or more documents is received. The seed is used to evaluate the relevance of each document to one or more of the received topics. One or more topics is associated with each document.
申请公布号 US8903808(B2) 申请公布日期 2014.12.02
申请号 US201313757667 申请日期 2013.02.01
申请人 Wal-Mart Stores, Inc. 发明人 Harinarayan Venky;Rajaraman Anand
分类号 G06F17/30 主分类号 G06F17/30
代理机构 Stevens Law Group 代理人 Stevens David R.;Stevens Law Group
主权项 1. A method comprising: receiving, by a computer system, a selection of a seed set from a linked document corpus, the seed set relating to a topic; calculating, by the computer system, for each document of the linked document corpus, a destination score according to a biased random walk of the linked document corpus, where the random walk is biased toward the seed set; calculating, by the computer system, for each document of the linked document corpus, a source score according to an effect of the each document on the destination scores of other documents in the linked document corpus according to a link structure of the linked document corpus; receiving a query identifying the topic; selecting one or more documents from the linked document corpus according to topic scores based on a combination of the source and destination scores of the documents of the linked document corpus; and returning the selected one or more document as a result for the query wherein calculating, by the computer system, for each document of the linked document corpus, the destination score according to a biased random walk of the linked document corpus further comprises: initializing source scores for the documents of the linked document corpus, such that documents of the seed set have a non-zero source score and other documents have a source score of zero; calculating the destination score for the each document according to a random walk of a link structure of the linked document corpus with random teleportation to documents of the linked document corpus where a probability of teleportation to a document is proportional to a source score thereof.
地址 Bentonville AR US