摘要 |
<P>PROBLEM TO BE SOLVED: To provide an action control technology for making action determination following the statistics of learning data more than a conventional technology in the case of anything other than a desired action sequence. <P>SOLUTION: This action control device is configured to acquire one previous action a<SB POS="POST">t-1</SB>from an action storage part, and to refer to a POMDP probability/reward table storage part by using one previous action a<SB POS="POST">t-1</SB>and a current observation value o<SB POS="POST">t</SB>', and to acquire state transition probability P(s'¾s,a) changing from a state s to a state s' according to an action (a) and observation value output probability P(o'¾s',a) when an observation value o' is observed in the state s' according to the action (a), and to acquire probability distribution b<SB POS="POST">t-1</SB>(s) in one previous state from a state probability distribution storage part, and to calculate probability distribution in a current state as follows. <P>COPYRIGHT: (C)2012,JPO&INPIT |