Skip to content

Optimizing Local Satisfaction of Long-Run Average Objectives in Markov Decision Processes

    Abstract: Long-run average optimization problems for Markov decision processes (MDPs) require constructing policies with optimal steady-state behavior, i.e., optimal limit frequency of visits to the states. However, such policies may suffer from local instability in the sense that the frequency of states visited in a bounded time horizon along a run differs significantly from the limit frequency. In this work, we propose an efficient algorithmic solution to this problem.

    Authors: David Klašk, Antonín Kučera, Vojtěch Kůr, Vít Musil, and Vojtěch Řehák

    Published: Proceedings of the AAAI Conference on Artificial Intelligence, 38(18), 20143-20150.

    Full Publication

    DOI: https://doi.org/10.1609/aaai.v38i18.29993