发明名称 INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND PROGRAM
摘要 There is provided an information processing apparatus including a reward estimator generator using action history data, including state data expressing a state, action data expressing an action taken by an agent, and a reward value expressing a reward obtained as a result of the action, as learning data to generate, through machine learning, a reward estimator estimating a reward value from inputted state data and action data. The reward estimator generator includes: a basis function generator generating a plurality of basis functions; a feature amount vector calculator calculating feature amount vectors by inputting state data and action data in the action history data into the basis functions; and an estimation function calculator calculating an estimation function estimating the reward value included in the action history data from the feature amount vectors according to regressive/discriminative learning. The reward estimator includes the plurality of basis functions and the estimation function.
申请公布号 US2015278694(A1) 申请公布日期 2015.10.01
申请号 US201514738522 申请日期 2015.06.12
申请人 c/o SONY CORPORATION 发明人 KOBAYASHI Yoshiyuki
分类号 G06N5/04;G06N99/00 主分类号 G06N5/04
代理机构 代理人
主权项 1. A machine learning apparatus comprising: a central processing unit (CPU) operable to: learn from action history data that includes state data, action data, and reward value, wherein the state data expressing a state of an agent, the action data expressing an action taken by the agent in the state, and the reward value expressing a reward obtained by the agent as a result of the action;estimate a plurality of reward values based on current state data and a plurality of action data expressing actions that can be taken next by the agent;select, among the actions, one action that has a highest estimated reward value; andexecute the action that has a highest estimated reward value.
地址 TOKYO JP