发明名称 Method and system for mapping states and actions of an intelligent agent
摘要 A method and system comprise providing means and method for producing, modifying, and/or exploiting the structure of a policy manifold. Each of the policies at least comprises information for mapping state and/or sensory information as input to action preferences as output. One or more processing units assign each of the policies a policy coordinate on a policy manifold. The policy coordinate may in part be determined by a dissimilarity matrix or other means for organizing the coordinates of the policies on the policy manifold according to the properties of the policies and the topology of the policy manifold. The policy manifold comprises a dimensionality that is lower than a combined dimensionality of the input and the output, wherein the policy manifold at least in part determines a behavior of the intelligent artificial agent.
申请公布号 US9311600(B1) 申请公布日期 2016.04.12
申请号 US201313907936 申请日期 2013.06.02
申请人 发明人 Ring Mark Bishop
分类号 G06N5/02;G06N99/00 主分类号 G06N5/02
代理机构 Innovation Capital Law Group, LLP 代理人 Innovation Capital Law Group, LLP ;Lin Vic
主权项 1. A method for mapping states and actions of an intelligent artificial agent, the method comprising the steps of: creating, by said at least one or more processors, at least a policy defining a behavior of said intelligent artificial agent, said policy comprising a set of policies for said intelligent artificial agent, each of said policies comprising at least a full or partial agent state information for mapping to an agent's action; creating a policy manifold, said policy manifold comprising a point on a surface associated with a policy, said surface comprising a set of surface points where each of said surface point is associated with each of said policies, said policy manifold further comprising at least a policy coordinate for each of said surface points; associating each of said policies to said policy coordinate on said policy manifold; organizing the policy coordinates of said policy manifold based on a property and distance of each of said policies on said policy manifold, wherein distances between policy coordinates are configured to reflect policy dissimilarities among each of said policies; comparing actions produced by each policy having the same state information to determine dissimilarity between neighboring policies; applying a learning update to the coordinates of at least one of said policies having dissimilarities among neighboring policies, said learning update being configured to modify said policy coordinate to have a shorter distance to policies with lesser dissimilarities, wherein said policy manifold is configured to show greater smoothness when dissimilarity between policies is smaller between policies whose coordinates in the policy manifold have a shorter distance.
地址