摘要 |
PROBLEM TO BE SOLVED: To generate an action plan capable of maximizing reward by less action experience. SOLUTION: In a step S1, prediction processing capable of obtaining maximum reward in a reccurent type neural network is executed by forward dynamics. In a step S2, plan generation processing is executed by reverse dynamics. Consequently a series of action difference values for obtaining the maximum reward are generated as an action plan. Processing mentioned above is repeatedly executed until it is judged that acquisition of a required action plan is done.
|