发明名称 情報処理装置、情報処理方法、及びプログラム
摘要 Provided is an information processing apparatus including: a reward estimator generating unit using action history data, which includes state data expressing a state of an agent, action data expressing an agent's action, and a reward value expressing a reward of the action, as learning data to generate, through machine learning, a reward estimator estimating the reward value from inputted state data and action data; an action selecting unit preferentially selecting an action not included in the action history data but with a high estimated reward value; and an action history adding unit causing the agent to perform the selected action and adding to the action history data the state data and action data for the action and the action's reward value in association with each other. The reward estimator is regenerated when a set of state data, action data, and the reward value is added to the action history data.
申请公布号 JP5879899(B2) 申请公布日期 2016.03.08
申请号 JP20110224639 申请日期 2011.10.12
申请人 ソニー株式会社 发明人 小林 由幸
分类号 A63F13/56;A63F13/67;G06N3/08 主分类号 A63F13/56
代理机构 代理人
主权项
地址