Deep Reinforcement Learning Approach for Inventory Policy Tested in Simulation Environment

Vendor Managed Inventory (VMI) is a mainstream supply chain collaboration model. The term VMI is commonly considered as a strategy in which the stock replenishment decision is shifted to the supplier.

In this work, the researchers undertake a root-cause enabling VMI performance measurement approach to assign responsibilities for poor performance. Additionally, the work proposes a solution methodology based on reinforcement learning for determining optimal replenishment policy in a VMI setting. Using a simulation model as a training environment, different demand scenarios are generated based on real data from Infineon Technologies AG and compared based on key performance indicators.

It is important to define a clear responsibility assignment regarding the VMI application and whenever the defined Min/Max inventory limits are violated. Consequently, a metric is outlined and further developed to monitor stock violations and assign responsibilities.

The process for developing a metric like that begins with the analysis of the underlying VMI configuration.

Reinforcement learning tested in simulation environment: VMI configuration

Typical VMI configuration

The collaboration starts with the customer providing the demand forecasts to the supplier. Considering the current stock information, the supplier plans and delivers replenishments, which may be pulled by the customer from the stock at any point in time. It is relevant to mention that the supplier does not receive any information on the generation of the demand forecasts.

As per the current setup, the supplier will be held responsible for a failed delivery. This calls for the use of a root-cause enabling VMI performance measurement approach, which could adequately assign responsibilities for any kind of stock violation.

Reinforcement learning and a simulation environment

The performance measurement model is developed in AnyLogic. The simulation model is further extended with reward function, state (observation) space, and action space, to prepare it as a reinforcement learning training environment in an external integrated development environment (IDE) called IntelliJ IDEA.

The simulation model is exported as a Java standalone application and imported into IntelliJ. A reinforcement learning for Java (RL4J) library is utilized to make the agent learn a policy. The trained model is imported back into the AnyLogic simulation model as a testbed. There the extended model is used as an environment to teach the learning agent on taking appropriate actions to achieve the desired state.

The VMI performance measurement approach is validated in an AnyLogic discrete event simulation environment. The sensitivity of the developed approach is tested using different parameters which includes forecast information, daily replenishments, and actual demand (pull) of 853 days.


In this paper, the root-cause enabling VMI performance measurement approach was extended to measure the responsibility for poor performance. It was performed by taking account of the forecast accuracy for the demand mutually agreed upon between the collaborating partners.

The approach was tested and validated via simulation on a set of company data. Considering room for reduction in stock violations from a supplier’s perspective, optimization in the replenishment policy was studied and implemented using a deep reinforcement learning algorithm in a simulation environment.

Reinforcement learning tested in simulation environment: diagrams

[a] Max Z, Min z, and Daily Stock Level and [b] Responsibility and Inventory States

Related posts