摘要 |
An information processing apparatus optimizes an action in a transition model in which a number of objects in each state transits according to the action. A cost constraint acquisition unit acquires multiple cost constraints including one that constrains a total cost of the action over at multiple timings and/or multiple states. A processing unit assumes action distribution in each state at each timing as a decision variable in an optimization problem and maximizes an objective function subtracting a term based on an error between an actual number of objects with the action in each state at each timing and an estimated number of objects in each state at each timing based on state transition by the transition model, from a total reward in a whole period, satisfying the multiple cost constraints. An output unit outputs the action distribution in each state at each timing that maximizes the objective function. |