发明名称 |
Online temporal difference learning from incomplete customer interaction histories |
摘要 |
In one embodiment, an indication that a decision has been requested, selected, or applied with respect to one or more users may be obtained. After the indication that a decision that has been requested, selected, or applied is obtained, a value function may be updated, where the value function approximates an expected reward associated with the one or more users over time since the decision has been requested, selected, or applied with respect to the one or more users. The value function may be updated by performing or providing one or more updates to the value function, where a time at which each of the one or more updates is performed or provided is independent of activity of the one or more users. |
申请公布号 |
US9367820(B2) |
申请公布日期 |
2016.06.14 |
申请号 |
US201414571403 |
申请日期 |
2014.12.16 |
申请人 |
NICE SYSTEMS TECHNOLOGIES UK LIMITED |
发明人 |
Newnham Leonard Michael;McFall Jason Derek;Barker David J;Silver David |
分类号 |
G06N99/00;G06N5/04 |
主分类号 |
G06N99/00 |
代理机构 |
Pearl Cohen Zedek Latzer Baratz LLP |
代理人 |
Pearl Cohen Zedek Latzer Baratz LLP |
主权项 |
1. A computer implemented method, comprising:
obtaining an indication that a decision has been requested, selected, or applied with respect to one or more users; and after obtaining the indication, updating a value time dependent function, including performing or providing one or more updates to the value time dependent function, wherein a time at which each of the one or more updates is performed or provided is independent of activity of the one or more users; wherein the one or more updates to the value time dependent function indicate update(s) to one or more weights associated with one or more parameters of the value time dependent function, wherein the update(s) to the one or more weights include a modification or replacement value for each of the one or more weights, wherein performing or providing the one or more updates comprises performing or providing a plurality of updates according to a varying interval, wherein the varying interval is determined based, at least in part, upon a random component that varies for each update. |
地址 |
Hampshire GB |