发明名称 METHOD AND APPARATUS FOR CONTEXTUAL LINEAR BANDITS
摘要 A method of selection that maximizes an expected reward in a contextual multi-armed bandit setting gathers rewards from randomly selected items in a database of items, where the items correspond to arms in a contextual multi-armed bandit setting. Initially, an item is selected at random and is transmitted to a user device which generates a reward. The items and resulting rewards are recorded. Subsequently, a context is generated by the user device which causes a learning and selection engine to calculate an estimate for each arm in the specific context, the estimate calculated using the recorded items and resulting rewards. Using the estimate, an item from the database is selected and transferred to the user device. The selected item is chosen to maximize a probability of a reward from the user device.
申请公布号 US2015095271(A1) 申请公布日期 2015.04.02
申请号 US201314402324 申请日期 2013.06.14
申请人 THOMSON LICENSING 发明人 Ioannidis Stratis;Yan Jinyun;Bento Ayres Pereira Jose
分类号 G06N99/00;G06N7/00 主分类号 G06N99/00
代理机构 代理人
主权项 1. A method of selection that maximizes an expected reward in a contextual multi-armed bandit setting, the method comprising: (a) training a learning and selection engine having access to a plurality of items corresponding to arms in the contextual multi-armed bandit setting; (b) receiving, by the learning and selection engine from a user device, a context in which to select one item from a plurality of items, the plurality of items corresponding to arms in the contextual multi-armed bandit setting; (c) calculating an estimate for each arm in the context, the estimate calculated using a history of past events; (d) selecting an arm that maximizes the expected reward; (e) providing a selection item corresponding to the selected arm for the context received, the selection item transferred to the user device; and (f) receiving and displaying a reward, sent by the user device to the learning and selection engine.
地址 Issy de Moulineaux FR