INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND PROGRAM,申请号US201514738522-传众专利搜索

发明名称	INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND PROGRAM
摘要	There is provided an information processing apparatus including a reward estimator generator using action history data, including state data expressing a state, action data expressing an action taken by an agent, and a reward value expressing a reward obtained as a result of the action, as learning data to generate, through machine learning, a reward estimator estimating a reward value from inputted state data and action data. The reward estimator generator includes: a basis function generator generating a plurality of basis functions; a feature amount vector calculator calculating feature amount vectors by inputting state data and action data in the action history data into the basis functions; and an estimation function calculator calculating an estimation function estimating the reward value included in the action history data from the feature amount vectors according to regressive/discriminative learning. The reward estimator includes the plurality of basis functions and the estimation function.
申请公布号	US2015278694(A1)	申请公布日期	2015.10.01
申请号	US201514738522	申请日期	2015.06.12
申请人	c/o SONY CORPORATION	发明人	KOBAYASHI Yoshiyuki
分类号	G06N5/04;G06N99/00	主分类号	G06N5/04
代理机构		代理人
主权项	1. A machine learning apparatus comprising: a central processing unit (CPU) operable to: learn from action history data that includes state data, action data, and reward value, wherein the state data expressing a state of an agent, the action data expressing an action taken by the agent in the state, and the reward value expressing a reward obtained by the agent as a result of the action;estimate a plurality of reward values based on current state data and a plurality of action data expressing actions that can be taken next by the agent;select, among the actions, one action that has a highest estimated reward value; andexecute the action that has a highest estimated reward value.
地址	TOKYO JP