发明名称 |
REINFORCEMENT LEARNING DEVICE, CONTROL DEVICE, AND REINFORCEMENT LEARNING METHOD |
摘要 |
<P>PROBLEM TO BE SOLVED: To solve such a problem that trade-offs, which are generated between a plurality of items composing a reward function, conventionally impede motion learning in a robot. <P>SOLUTION: A reinforcement learning device comprises: first kind environmental parameter acquisition means for acquiring a value(s) of one or more first kind environment parameters concerning an environment of a control target; control parameter value calculation means for substituting the value(s) of the one or more first kind environmental parameters for a reward function to calculate a value(s) of one or more control parameters which make the maximum reward output by the reward function; control parameter value output means for outputting the value(s) of the one or more control parameters to the control target; second kind environmental parameter acquisition means for acquiring a value(s) of one or more second kind environment parameters concerning virtual external force; virtual external force calculation means for substituting the value(s) of the one or more second kind environmental parameters for a virtual external force function to calculate the virtual external force; and virtual external force output means for outputting the virtual external force to the control target. Thus, with the reinforcement learning device, it is possible to quickly and stably perform motion learning of a robot. <P>COPYRIGHT: (C)2013,JPO&INPIT |
申请公布号 |
JP2012208789(A) |
申请公布日期 |
2012.10.25 |
申请号 |
JP20110074694 |
申请日期 |
2011.03.30 |
申请人 |
ADVANCED TELECOMMUNICATION RESEARCH INSTITUTE INTERNATIONAL;HONDA MOTOR CO LTD |
发明人 |
SUGIMOTO TOKUKAZU;UEDA YUGO;HASEGAWA TADAAKI;IBA NOBUMOTO;AKATSUKA KOJI |
分类号 |
G05B13/02;G06N3/00 |
主分类号 |
G05B13/02 |
代理机构 |
|
代理人 |
|
主权项 |
|
地址 |
|