发明名称 MONTE-CARLO PLANNING USING CONTEXTUAL INFORMATION
摘要 A method, system and computer program product for choosing actions in a state of a planning problem. The system simulates one or more sequences of actions, state transitions and rewards starting from the current state of the planning problem. During the simulation of performing a given action in a given state, a data record is maintained of observed contextual state information, and observed cumulative reward resulting from the action. The system performs a regression fit on the data records, enabling estimation of expected reward as a function of contextual state. The estimations of expected rewards are used to guide the choice of actions during the simulations. Upon completion of all simulations, the top-level action which obtained highest mean reward during the simulations is recommended to be executed in the current state of the planning problem.
申请公布号 US2013185039(A1) 申请公布日期 2013.07.18
申请号 US201213348993 申请日期 2012.01.12
申请人 TESAURO GERALD J.;BEYGELZIMER ALINA;SEGAL RICHARD B.;WEGMAN MARK N.;INTERNATIONAL BUSINESS MACHINES CORPORATION 发明人 TESAURO GERALD J.;BEYGELZIMER ALINA;SEGAL RICHARD B.;WEGMAN MARK N.
分类号 G06G7/48 主分类号 G06G7/48
代理机构 代理人
主权项
地址