发明名称 Online asynchronous reinforcement learning from concurrent customer histories
摘要 In one embodiment, an indication of a Decision Request or an Update Request may be received, where the Update Request is activated independent of user activity. A user state pertaining to at least one user may be received, obtained, accessed or constructed. For the Decision Request, one or more actions may be scored according to one or more value functions associated with a computing device, a policy associated with the computing device may be applied to identify one of the scored actions as a decision, and an indication of the decision may be provided or applied. For the Update Request, the one or more value functions and/or the policy may be updated. An indication of updates to the one or more value functions and/or an indication of updates to the policy may be provided.
申请公布号 US8909590(B2) 申请公布日期 2014.12.09
申请号 US201213631032 申请日期 2012.09.28
申请人 Nice Systems Technologies UK Limited 发明人 Newnham Leonard Michael;McFall Jason Derek;Barker David J;Silver David
分类号 G06N7/02;G06N7/06;G06F9/44;G06N99/00 主分类号 G06N7/02
代理机构 Pearl Cohen Zedek Latzer Baratz LLP 代理人 Pearl Cohen Zedek Latzer Baratz LLP
主权项 1. A computer implemented method, comprising: obtaining an indication that a decision has been requested or selected with respect to one or more users; determining whether to schedule, request, or perform a set of one or more activities, the set of one or more activities including performing one or more updates and selecting one or more decisions, wherein the one or more updates are performed with respect to a value function approximating an expected reward over time for the one or more users and a policy for selecting additional decisions, and wherein the one or more decisions pertain to the one or more users; and scheduling, requesting, or performing the set of one or more activities according to a result of the determining step, wherein scheduling, requesting, or performing the set of one or more activities comprises: generating a sequence of requests, wherein the sequence of requests includes one or more Update Requests and one or more Decision Requests, wherein each request in the sequence of requests pertains to the one or more users; and providing or transmitting each request in the sequence of requests or indication thereof according to a particular schedule, wherein each of the one or more Decision Requests indicates a request to select an additional decision with respect to the at least one user, wherein each of the Update Requests indicates at least one of: a request to update a value function approximating an expected reward over time for the one or more users and a request to update a policy for selecting additional decisions.
地址 Southhampton GB
您可能感兴趣的专利