Training Reinforcement Learning Policy in AnyLogic Simulation Environment Using Pathmind


Coffee shop operations have previously been studied with a focus on optimizing employee availability, customer table placement, and reducing customer service time. An earlier coffee shop simulation model concluded that increasing the number of baristas reduced the average service time. However, the cost of adding more baristas increased the overall operational cost of the coffee shop.

In this paper, the researchers study the operations of an imaginary coffee shop with a focus on the barista’s actions. They also show how the sequence of actions affects the overall performance of the coffee shop by using reinforcement learning and simulation as its policy training environment. This model acts as a guiding example that shows the ease of applying RL in AnyLogic models using the Pathmind Library.

Reinforcement Learning and Simulation Training Environment

Reinforcement learning is a core field of machine learning that deals with sequential decision-making. In RL, an agent is the decision-maker in the model and everything that can influence the agent's decision is its environment.

The focus of RL optimization is to maximize or minimize a certain reward which is then reflected in the desired outcome. The actions performed by the RL agent are dynamic in nature and are only dependent on observations from the environment (simulation model or real-world deployment).

A simulation model is only required for training the model-free reinforcement learning policy, but it can work as a stand-alone model-free "Oracle" for decision making in real deployment.

Coffeeshop Simulation Model

The simulation model was created using AnyLogic Professional 8.5 software. The coffee shop model has two agents – Customer and Server. The customer agents are modeled as a population of agents following a discrete event modeling process. Meanwhile, the server agent is a barista modelled using agent-based modeling principles.

The customer agents are modeled using the Pedestrian Library available in AnyLogic. The pedestrian library works almost the same as the Process Modeling Library with the exception that the pedestrians move according to the physical rules provided in the simulation environment. They can also make decisions based on the situation in the environment.

The Barista in the model is the decision-maker and can take up tasks when idle, such as – taking order, preparing and delivering order, bill customers, and clean the kitchen. To model the barista’s actions, the researchers used AnyLogic statecharts.

Pathmind Helper

There are 5 important elements when implementing reinforcement learning in the AnyLogic simulation environment and using the Pathmind Library – the Observation Function, Reward Variables, Action Function, Action Triggers, and Reward Function. Every element plays an important role in making sure the RL agent learns and behaves effectively.

The RL elements are included in an AnyLogic software library called Pathmind Helper from Pathmind. All RL related functions are added in Pathmind Helper before exporting the model for training on the Pathmind web application.


This paper demonstrated the ease of applying Reinforcement Learning to AnyLogic simulation models using the Pathmind Library. A hybrid simulation model (DES and ABM) of a conceptual coffee shop with a single barista as a learning agent was trained to perform actions using Reinforcement Learning.

The RL policy could help coffee shop owners improve the efficiency of their operations and save on the time and cost of training new baristas.

Simulation environment – barista's statechart

Barista's action statechart

Related posts