A Distributed Policy Iteration Scheme for Cooperative Multi-Agent Policy Approximation


We propose Stable Emergent Policy (STEP) approximation, a distributed policy iteration scheme to stably approximate decentralized policies for partially observable and cooperative multi-agent systems. STEP offers a novel training architecture, where function approximation is used to learn from action recommendations of a decentralized planning algorithm. Planning is enabled by exploiting a training simulator, which is assumed to be available during centralized learning, and further enhanced by reintegrating the learned policies. We experimentally evaluate STEP in two challenging and stochastic domains, and compare its performance with state-of-the-art multi-agent reinforcement learning algorithms.

12th Adaptive and Learning Agents Workshop (ALA 2020)