发明名称 INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND PROGRAM
摘要 <P>PROBLEM TO BE SOLVED: To automatically construct a reward estimator. <P>SOLUTION: There is provided an information processing apparatus which includes a reward estimator generating unit using action history data, which includes state data expressing a state of an agent, action data expressing an agent's action in the state, and a reward value expressing a reward obtained by the agent as a result of the action, as learning data to generate, through machine learning, a reward estimator estimating the reward value from inputted state data and action data. The reward estimator generating unit generates a plurality of basis functions by combining a plurality of processing functions, calculates feature amount vectors by inputting the state data and the action data included in the action history data into the plurality of basis functions, and calculates an estimation function that estimates the reward value included in the action history data from the feature amount vectors according to regressive/discriminative learning. The reward estimator includes the plurality of basis functions and the estimation function. <P>COPYRIGHT: (C)2013,JPO&INPIT
申请公布号 JP2013084175(A) 申请公布日期 2013.05.09
申请号 JP20110224638 申请日期 2011.10.12
申请人 SONY CORP 发明人 KOBAYASHI YOSHIYUKI
分类号 G06N3/00 主分类号 G06N3/00
代理机构 代理人
主权项
地址