摘要 |
PROBLEM TO BE SOLVED: To execute an action enlarging an action range as more realistic one. SOLUTION: A robot device is constituted to learn a read sensor value St detected by a detector, the real sensor value St to be detected by the detector is inputted thereafter, a prediction sensor value St+1 obtained based on the learning result corresponding thereto is outputted, and a homing reward RHt+1 which becomes larger, as a difference between a next-time real sensor value St+1 and the prediction sensor value St+1 becomes smaller, is outputted. When the difference between the existing sensor prediction value and the sensor measured value becomes smaller, RNN 103 sets the value of the homing reward RHt+1 to become larger. It is because as approaching to the home 203, it becomes more accustomed thereto (a learned place), so that the sensor prediction value can be obtained as a value near the sensor measured value.
|