发明名称 REINFORCEMENT LEARNING DEVICE, CONTROL DEVICE, AND REINFORCEMENT LEARNING METHOD
摘要 <P>PROBLEM TO BE SOLVED: To solve such a problem that trade-offs, which are generated between a plurality of items composing a reward function, conventionally impede motion learning in a robot. <P>SOLUTION: A reinforcement learning device comprises: first kind environmental parameter acquisition means for acquiring a value(s) of one or more first kind environment parameters concerning an environment of a control target; control parameter value calculation means for substituting the value(s) of the one or more first kind environmental parameters for a reward function to calculate a value(s) of one or more control parameters which make the maximum reward output by the reward function; control parameter value output means for outputting the value(s) of the one or more control parameters to the control target; second kind environmental parameter acquisition means for acquiring a value(s) of one or more second kind environment parameters concerning virtual external force; virtual external force calculation means for substituting the value(s) of the one or more second kind environmental parameters for a virtual external force function to calculate the virtual external force; and virtual external force output means for outputting the virtual external force to the control target. Thus, with the reinforcement learning device, it is possible to quickly and stably perform motion learning of a robot. <P>COPYRIGHT: (C)2013,JPO&INPIT
申请公布号 JP2012208789(A) 申请公布日期 2012.10.25
申请号 JP20110074694 申请日期 2011.03.30
申请人 ADVANCED TELECOMMUNICATION RESEARCH INSTITUTE INTERNATIONAL;HONDA MOTOR CO LTD 发明人 SUGIMOTO TOKUKAZU;UEDA YUGO;HASEGAWA TADAAKI;IBA NOBUMOTO;AKATSUKA KOJI
分类号 G05B13/02;G06N3/00 主分类号 G05B13/02
代理机构 代理人
主权项
地址