Train AI-agents with Microsoft Project Bonsai

Train AI-agents with Microsoft Project Bonsai and AnyLogic simulation.

AnyLogic has joined forces with Microsoft to bring the deep reinforcement learning and machine teaching capabilities of Project Bonsai to practical business applications. We are excited to announce that, through our collaboration with Microsoft, we have jointly developed an easy-to-use connector that allows you to use AnyLogic models as simulators for Microsoft’s Project Bonsai.

Deep Reinforcement Learning 101

Reinforcement Learning is based on the idea of framing problems as a Markov decision process where an AI agent (a specialized algorithm) learns a control policy to always pick the best possible action for a given state of the system.

Successful application of neural networks in conjunction with reinforcement learning (hence the name “deep reinforcement learning”) opened new horizons to deal with more complex scenarios that were previously deemed impossible. Initially, deep reinforcement learning (DRL) received the most attention for teaching AI agents how to play games like Pong and other Atari classics. Subsequently, when Deepmind’s AlphaGo defeated the Go world champion, Lee Sedol, it took many skeptics by surprise and provided solid evidence about the plausibility and potential of DRL.

There are two sides to any DRL setup: the environment (simulator) and the learning agent (artificial brain). The learning agent experiments in the environment and, by reinforcing desirable outcomes, learns a control policy.

Learning environment (simulator)

From early on, it became evident to reinforcement learning (RL) researchers that developing solid environments to examine their learning agents’ capabilities is not an easy task. OpenAI filled that gap by providing a library of example environments that could be used to learn more about RL and increase the performance of the latest RL algorithms. Soon thereafter, OpenAI added more examples of simple physics-based models, either as simple robotic systems or games that took advantage of built-in physics engines. However, up to this day, the majority of those examples have not passed beyond being toy environments that are specialized for testing/benchmarking of the AI learning capabilities.

Learning agent (artificial brain)

Proper understanding and design of learning algorithms and associated neural network architecture is not a trivial task. It requires extensive knowledge across several domains such as programming, algorithms, deep learning, and reinforcement learning itself. With the exception of a relatively small number of AI experts whose sole area of research is RL, the sheer number of prerequisites makes RL one of the most challenging areas of machine learning (ML) for newcomers.

Project Bonsai and its machine teaching approach helps SMEs unlock the power of RL in their domain expertise

Project Bonsai enables subject matter experts, even those with no AI background, to incorporate their expertise directly into an AI model and teach it how to solve real-world business problems. Project Bonsai uses an approach called Machine Teaching which allows the subject matter expert to leverage their domain knowledge to unpack complex tasks into simpler concepts or lessons. Project Bonsai's AI engine first learns each lesson individually before combining and orchestrating the individual lessons to achieve the end objective. This approach significantly decreases model training time and allows for the reusability of each individual concept.

In short, Project Bonsai provides the fundamental abstraction needed to combine the subject matter’s expertise with machine learning algorithms in an efficient and application-centered setup. On top of decluttering the process by abstracting away from the nuanced complexities of RL, Project Bonsai is built on Microsoft Azure. That allows Project Bonsai to seamlessly scale up the simulations needed using Microsoft Azure.

AnyLogic and Project Bonsai collaboration brings next-gen RL to business problems

As mentioned, almost all the learning environments (simulators) that are available are either games, toy examples, or physics-based models. However, simulation modelers that are specialized in dynamic simulation know very well that there is a mature and established breed of simulations used for business applications. As the market leader in simulation modeling for businesses, AnyLogic has a large user base among industry leaders in multiple domains that have already used simulation for their most complex problems. The addition of deep reinforcement learning and the machine teaching capabilities of Project Bonsai open a new dimension into novel types of simulation-based solutions that are geared toward adaptive control policies.

To simplify the conversion of simulation models into learning environments (simulators), we have developed a simple-to-implement wrapper model: a customized AnyLogic model that has all the Project Bonsai connectivity requirements natively. In addition to adding a connector library to AnyLogic, with a simple drag-and-drop, you can easily convert a regular AnyLogic model into a Project Bonsai “ready simulator”.

There are “Simulation Models” and there are “RL Simulation Models”

To put it briefly, a simulation model that is used for reinforcement learning delegates some of the decisions (actions) that are being taken throughout its execution to the learning agent (in this case, the Project Bonsai AI agent). By delegating these actions, the brain can take control of the simulated environment at certain points and use trial-and-error to gain experience in how to optimally make those decisions.

Therefore, any RL-ready model should be able to:

  • Set a desired configuration as the initial state of each simulation run (training episode). This will let the brain gain experience from a variety of situations, as expected in the real system.
  • Pause itself at the moments that the delegated decisions should be taken (episode step). These pauses can be pre-defined time intervals (e.g., every 6 hours) or at specific events in the model (e.g., call-back fields of process blocks, condition-based events, transitions of statecharts).
  • Communicate its current state to the brain (observations). Each observation is one or more numerical values that communicate the current state of the simulator in a succinct form.
  • Implement the chosen action by the brain in the model. This means that the model should have the necessary logic to implement the actions that are delegated to the brain.

The final step before using a model that is customized for RL is to add the bonsai connector library to the AnyLogic development environment and wrap the model in the provided wrapper model.

The BEST place to start…

We have prepared two AnyLogic example models that were refactored to be used as a learning environment (simulator) with Project Bonsai. They already have the wrapper incorporated into them and are ready to be used for training:

Activity-Based Costing Analysis

A simplistic factory floor model where cost associated with product processing is calculated and analyzed using Activity-Based Costing (ABC). Each incoming product seizes some resources, is processed by a machine, conveyed, and later releases the resources. Cost accumulated by a product is broken down into several categories for analysis and optimization. The goal is to reduce the cost per product while maintaining a high overall throughput.

Product Delivery

A supply chain includes three manufacturing facilities and fifteen distributors that order random amounts of a product every 2 to 10 days. Upon receiving an order from a distributor, each manufacturing facility waits until enough product is created to fulfill the order (if it does not have enough in its current inventory) and then sends a loaded truck to fulfill the order. The goal is to find the best policy that results in the lowest cost per product while also keeping the average delivery time to a minimum. It does this by varying which centers should be open, in addition to the production rate and number of trucks in each center.

The Project Bonsai connector library, the wrapper model, and its user guide (which explains the steps involved in preparing a Project Bonsai simulator), and the two example models are available for download [ZIP]. Comprehensive documentation is also provided in their companion README.md file.

To gain access to a preview of Project Bonsai, please request access.

After gaining access to the Project Bonsai Preview, please refer to the relevant sections of this Bonsai-AnyLogic example read-me document to learn about the process of:

  • Creating a brain in Project Bonsai
  • Running the model (simulator) locally to a brain that you have created in the first step
  • Exporting your model and scaling the training on Microsoft Azure (as an alternative to running your model on your local machine)

In conclusion, we at AnyLogic are committed to bringing together the most innovative and novel use of both business-oriented simulation models and state-of-the-art deep reinforcement learning to real-world manufacturing and operations. By simplifying the steps needed to repurpose/design RL-ready simulators, our goal is for analysts and engineers to be able to utilize advanced artificial intelligence without the need to become AI experts.

We will add more example models, documentation, and tutorials about this topic in the upcoming months. Please subscribe to our newsletter and stay tuned!

Learn more about combining machine learning and simulation, August 25, at our joint webinar with the H2O team!

Related posts