发明名称 Optimal policy determination using repeated stackelberg games with unknown player preferences
摘要 A system, method and computer program product for planning actions in a repeated Stackelberg Game, played for a fixed number of rounds, where the payoffs or preferences of the follower are initially unknown to the leader, and a prior probability distribution over follower types is available. In repeated Bayesian Stackelberg games, the objective is to maximize the leader's cumulative expected payoff over the rounds of the game. The optimal plans in such games make intelligent tradeoffs between actions that reveal information regarding the unknown follower preferences, and actions that aim for high immediate payoff. The method solves for such optimal plans according to a Monte Carlo Tree Search method wherein simulation trials draw instances of followers from said prior probability distribution. Some embodiments additionally implement a method for pruning dominated leader strategies.
申请公布号 US8545332(B2) 申请公布日期 2013.10.01
申请号 US201213364843 申请日期 2012.02.02
申请人 MARECKI JANUSZ;SEGAL RICHARD B.;TESAURO GERALD J.;INTERNATIONAL BUSINESS MACHINES CORPORATION 发明人 MARECKI JANUSZ;SEGAL RICHARD B.;TESAURO GERALD J.
分类号 A63F13/00 主分类号 A63F13/00
代理机构 代理人
主权项
地址