发明名称 INFORMATION PROCESSOR, INFORMATION PROCESSING METHOD, AND PROVIDING MEDIUM
摘要 <p>At step S1, a prediction operation to confer a maximum reward is carried out in a recurrent neural network by a forward dynamics. At step S2, a plan is made by a reverse dynamics. Thus, an action plan constituted of a sequence of differential values of an action for conferring the maximum reward. The steps are repeated until it is judged that a desired action plan is made at step S3. In such a way, an action plan which maximizes the reward is generated from a few action experiences.</p>
申请公布号 WO2000010098(P1) 申请公布日期 2000.02.24
申请号 JP1999004306 申请日期 1999.08.09
申请人 发明人
分类号 主分类号
代理机构 代理人
主权项
地址