主权项 |
1. A method for computerized control, regulation, or control and regulation of a technical system, the method comprising:
characterizing a dynamic behavior of the technical system for multiple points in time in each case by a state of the technical system and an action executed on the technical system, wherein a respective action at a respective point in time results in a new state of the technical system at the next point in time; providing, generating, or providing and generating action selection policies, wherein a respective action selection policy specifies an action to be executed at a corresponding point in time on the technical system, in dependence on at least the state of the technical system at the corresponding point in time, and wherein each action selection policy is associated with a complexity measure that describes a complexity of the respective action selection policy that is less than or equal to a predetermined complexity threshold; ascertaining the action selection policy having the highest evaluation measure of the provided, generated, or provided and generated action selection policies from the provided, generated, or provided and generated action selection policies by the calculation of evaluation measures, each of the evaluation measures describing the suitability of an action selection policy for the regulation, control, or regulation and control of the technical system, wherein a higher evaluation measure describes a better suitability of the action selection policy for the regulation, control, or regulation and control of the technical system, and wherein the evaluation measure of a respective action selection policy is dependent on:
a distance measure between the respective action selection policy and a predefined optimum action selection policy, wherein decreasing distance measures represent higher evaluation measures;a reward measure that results upon the execution of the respective action selection policy in a simulation of the technical system, wherein higher reward measures result in higher evaluation measures;a quality measure for the respective action selection policy, which is determined by an action selection policy evaluation method, wherein higher quality measures result in higher evaluation measures; orany combination thereof; regulating, controlling, or regulating and controlling the technical system based on the ascertained action selection policy. |