发明名称 |
SELECTING REINFORCEMENT LEARNING ACTIONS USING GOALS AND OBSERVATIONS |
摘要 |
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for reinforcement learning using goals and observations. One of the methods includes receiving an observation characterizing a current state of the environment; receiving a goal characterizing a target state from a set of target states of the environment; processing the observation using an observation neural network to generate a numeric representation of the observation; processing the goal using a goal neural network to generate a numeric representation of the goal; combining the numeric representation of the observation and the numeric representation of the goal to generate a combined representation; processing the combined representation using an action score neural network to generate a respective score for each action in the predetermined set of actions; and selecting the action to be performed using the respective scores for the actions in the predetermined set of actions. |
申请公布号 |
US2016292568(A1) |
申请公布日期 |
2016.10.06 |
申请号 |
US201615091840 |
申请日期 |
2016.04.06 |
申请人 |
Google Inc. |
发明人 |
Schaul Tom;Horgan Daniel George;Gregor Karol;Silver David |
分类号 |
G06N3/08;G06N99/00 |
主分类号 |
G06N3/08 |
代理机构 |
|
代理人 |
|
主权项 |
1. A method for selecting an action to be performed by a reinforcement learning agent that interacts with an environment by receiving observations characterizing a current state of the environment and, in response, performing actions from a predetermined set of actions, wherein the method comprises:
receiving an observation characterizing a current state of the environment; receiving a goal characterizing a target state from a set of target states of the environment; processing the observation using an observation neural network to generate a numeric representation of the observation; processing the goal using a goal neural network to generate a numeric representation of the goal; combining the numeric representation of the observation and the numeric representation of the goal to generate a combined representation; processing the combined representation using an action score neural network to generate a respective score for each action in the predetermined set of actions; and selecting the action to be performed using the respective scores for the actions in the predetermined set of actions. |
地址 |
Mountain View CA US |