发明名称 INFORMATION PROCESSOR, INFORMATION PROCESSING METHOD, AND PROVIDING MEDIUM
摘要 At step S1, a prediction operation to confer a maximum reward is carried out in a recurrent neural network by a forward dynamics. At step S2, a plan is made by a reverse dynamics. Thus, an action plan constituted of a sequence of differential values of an action for conferring the maximum reward. The steps are repeated until it is judged that a desired action plan is made at step S3. In such a way, an action plan which maximizes the reward is generated from a few action experiences.
申请公布号 WO0010098(A1) 申请公布日期 2000.02.24
申请号 WO1999JP04306 申请日期 1999.08.09
申请人 SONY CORPORATION;TANI, JUN 发明人 TANI, JUN
分类号 G06F15/18;B25J13/00;G05B13/02;G05D1/02;G06N3/00;(IPC1-7):G06F15/18;G05B13/00 主分类号 G06F15/18
代理机构 代理人
主权项
地址