发明名称 Method and apparatus for improved reward-based learning using nonlinear dimensionality reduction
摘要 The present invention is a method and an apparatus for reward-based learning of management policies. In one embodiment, a method for reward-based learning includes receiving a set of one or more exemplars, where at least two of the exemplars comprise a (state, action) pair for a system, and at least one of the exemplars includes an immediate reward responsive to a (state, action) pair. A distance measure between pairs of exemplars is used to compute a Non-Linear Dimensionality Reduction (NLDR) mapping of (state, action) pairs into a lower-dimensional representation, thereby producing embedded exemplars, wherein one or more parameters of the NLDR are tuned to minimize a cross-validation Bellman error on a holdout set taken from the set of one or more exemplars. The mapping is then applied to the set of exemplars, and reward-based learning is applied to the embedded exemplars to obtain a learned management policy.
申请公布号 US8060454(B2) 申请公布日期 2011.11.15
申请号 US20070870698 申请日期 2007.10.11
申请人 DAS RAJARSHI;TESAURO GERALD J.;WEINBERGER KILIAN Q.;INTERNATIONAL BUSINESS MACHINES CORPORATION 发明人 DAS RAJARSHI;TESAURO GERALD J.;WEINBERGER KILIAN Q.
分类号 G06F15/18 主分类号 G06F15/18
代理机构 代理人
主权项
地址