发明名称 Searching for safe policies to deploy
摘要 A method is disclosed for searching for policies to replace policies deployed in advertising campaigns, the policies being used to choose advertisements. The method includes searching policies to select a policy that is deemed safe by applying reinforcement learning and a concentration inequality on deployed policies using the policies to estimate values of a measure of performance of the policies and calculate one or more statistical guarantees of the estimated values 1004. One or more of the new policies are deployed 1006 when the policy is deemed safe. A system is also provided that uses a policy space to select policies to replace deployed policies by accessing high dimensional vectors expressing the policies and computing the direction in a policy space of the policies that is expected to point towards a region that is expected to be safe which is the policies that have a measure of performance that is greater than a threshold measure of performance and within a defined level of confidence, and selecting one of the polices accordingly (fig 11).
申请公布号 GB2532539(A) 申请公布日期 2016.05.25
申请号 GB20150013017 申请日期 2015.07.23
申请人 Adobe Systems Incorporated 发明人 Philip S Thomas;Georgios Theocharous;Mohammad Ghavamzadeh
分类号 G06Q30/02 主分类号 G06Q30/02
代理机构 代理人
主权项
地址