发明名称 |
INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND PROGRAM |
摘要 |
There is provided an information processing apparatus including a reward estimator generator using action history data, including state data expressing a state, action data expressing an action taken by an agent, and a reward value expressing a reward obtained as a result of the action, as learning data to generate, through machine learning, a reward estimator estimating a reward value from inputted state data and action data. The reward estimator generator includes: a basis function generator generating a plurality of basis functions; a feature amount vector calculator calculating feature amount vectors by inputting state data and action data in the action history data into the basis functions; and an estimation function calculator calculating an estimation function estimating the reward value included in the action history data from the feature amount vectors according to regressive/discriminative learning. The reward estimator includes the plurality of basis functions and the estimation function. |
申请公布号 |
US2015278694(A1) |
申请公布日期 |
2015.10.01 |
申请号 |
US201514738522 |
申请日期 |
2015.06.12 |
申请人 |
c/o SONY CORPORATION |
发明人 |
KOBAYASHI Yoshiyuki |
分类号 |
G06N5/04;G06N99/00 |
主分类号 |
G06N5/04 |
代理机构 |
|
代理人 |
|
主权项 |
1. A machine learning apparatus comprising:
a central processing unit (CPU) operable to:
learn from action history data that includes state data, action data, and reward value,
wherein the state data expressing a state of an agent, the action data expressing an action taken by the agent in the state, and the reward value expressing a reward obtained by the agent as a result of the action;estimate a plurality of reward values based on current state data and a plurality of action data expressing actions that can be taken next by the agent;select, among the actions, one action that has a highest estimated reward value; andexecute the action that has a highest estimated reward value. |
地址 |
TOKYO JP |