摘要 |
A novel highly-adaptable agent learning machine comprises a plurality of learning modules (3) each including a set of an intensive learning system (1) which works on an environment (4) and determines a behavior output for maximizing the reward given as a result of this and an environment predicting system (2) for predicting change of the environment. The smaller the prediction error of the environment predicting system (2) of each learning module (3) is, the larger the responsibility signal is required to have. In proportion to the responsibility signal, the behavior output from the intensive learning system (1) is weighted, and a behavior affecting the environment is given. In an environment having a nonlinearity/unsteadiness, such as a control object or a system, no specific teacher signal is given. The states of various environments and behaviors optimal to the operating modes are switched and combined. Without using foresight knowledge, behavior can be learned flexibly.
|