发明名称 |
INFORMATION PROCESSOR, INFORMATION PROCESSING METHOD, AND PROVIDING MEDIUM |
摘要 |
At step S1, a prediction operation to confer a maximum reward is carried out in a recurrent neural network by a forward dynamics. At step S2, a plan is made by a reverse dynamics. Thus, an action plan constituted of a sequence of differential values of an action for conferring the maximum reward. The steps are repeated until it is judged that a desired action plan is made at step S3. In such a way, an action plan which maximizes the reward is generated from a few action experiences.
|
申请公布号 |
WO0010098(A1) |
申请公布日期 |
2000.02.24 |
申请号 |
WO1999JP04306 |
申请日期 |
1999.08.09 |
申请人 |
SONY CORPORATION;TANI, JUN |
发明人 |
TANI, JUN |
分类号 |
G06F15/18;B25J13/00;G05B13/02;G05D1/02;G06N3/00;(IPC1-7):G06F15/18;G05B13/00 |
主分类号 |
G06F15/18 |
代理机构 |
|
代理人 |
|
主权项 |
|
地址 |
|