发明名称 ONLINE TEMPORAL DIFFERENCE LEARNING FROM INCOMPLETE CUSTOMER INTERACTION HISTORIES
摘要 <p>In one embodiment, an indication that a decision has been requested, selected, or applied with respect to one or more users may be obtained. After the indication that a decision that has been requested, selected, or applied is obtained, a value function may be updated, where the value function approximates an expected reward associated with the one or more users over time since the decision has been requested, selected, or applied with respect to the one or more users. The value function may be updated by performing or providing one or more updates to the value function, where a time at which each of the one or more updates is performed or provided is independent of activity of the one or more users.</p>
申请公布号 WO2013059517(A1) 申请公布日期 2013.04.25
申请号 WO2012US60904 申请日期 2012.10.18
申请人 CAUSATA INC.;NEWNHAM, LEONARD MICHAEL;MCFALL, JASON DEREK;BARKER, DAVID J.;SILVER, DAVID 发明人 NEWNHAM, LEONARD MICHAEL;MCFALL, JASON DEREK;BARKER, DAVID J.;SILVER, DAVID
分类号 G06Q30/02 主分类号 G06Q30/02
代理机构 代理人
主权项
地址