摘要 |
In one embodiment, the present invention is a method for reward-based learning of improved systems management policies. One embodiment of the inventive method involves supplying a first policy and a reward mechanism. The first policy maps states of at least one component of a data processing system to selected management actions, while the reward mechanism generates numerical measures of value responsive to particular actions (e.g., management actions) performed in particular states of the component(s). The first policy and the reward mechanism are applied to the component(s), and results achieved through this application (e.g., observations of corresponding states, actions and rewards) are processed in accordance with reward-based learning to derive a second policy having improved performance relative to the first policy in at least one state of the component(s).
|