摘要 |
The invention concerns a method of reinforcement learning, the method comprising the steps of perceiving (101) a current state from a fuzzy set of states of an environment; based on the current state and a policy, choosing (102) an action from a fuzzy set of actions, wherein the policy associates each state from the fuzzy set of states with an action from the fuzzy set of actions and, for each state from the fuzzy set of states, is based on a probability distribution on the fuzzy set of actions; receiving (103) from the environment a new state and a reward; and, based on the reward, optimizing (104) the policy. The invention further concerns a computer program product and a device therefor.
|