发明名称 Action selection for reinforcement learning using influence diagrams
摘要 A system and method for online reinforcement learning is provided. In particular, a method for performing the explore-vs.-exploit tradeoff is provided. Although the method is heuristic, it can be applied in a principled manner while simultaneously learning the parameters and/or structure of the model (e.g., Bayesian network model). The system includes a model which receives an input (e.g., from a user) and provides a probability distribution associated with uncertainty regarding parameters of the model to a decision engine. The decision engine can determine whether to exploit the information known to it or to explore to obtain additional information based, at least in part, upon the explore-vs.-exploit tradeoff (e.g., Thompson strategy). A reinforcement learning component can obtain additional information (e.g., feedback from a user) and update parameter(s) and/or the structure of the model. The system can be employed in scenarios in which an influence diagram is used to make repeated decisions and maximization of long-term expected utility is desired.
申请公布号 US2006224535(A1) 申请公布日期 2006.10.05
申请号 US20050169503 申请日期 2005.06.29
申请人 MICROSOFT CORPORATION 发明人 CHICKERING DAVID M.;PAEK TIMOTHY S.;HORVITZ ERIC J.
分类号 G06F15/18 主分类号 G06F15/18
代理机构 代理人
主权项
地址