Solving the Bin Packing Problem in warehousing and logistics – strategy comparison

A blog post inspired by Decision Lab. You can read the company’s original post from Damien Lopez on Medium – Who wants to figure out how to pack anyway?


The 3D Bin Packing Problem (3D-BPP) is one of the most frequent problems in warehousing and logistics. Its solution is filling a container (a box or pallet) with items as closely to each other as possible to minimize the number of required containers.

What if I told you that you’d started solving the Bin Packing Problem before you packed bags for college? Remember the famous Tetris game – a great exercise of space usage under time pressure?

Or remember that feeling when you needed to pack because you were moving to a new place or about to go on vacation? In both cases, the more empty space in that box or suitcase you have, the more boxes or suitcases you need.

With such a modest volume of boxes or suitcases, you can rely on intuition and experience (as we all usually do). But what if there are dozens of boxes? In logistics, a gut feeling and human miscalculations could result in paying for an extra pallet or truck – substantial expenses for the company.

To think ahead of any business challenge, check out the Developing Disruptive Business Strategies white paper and leverage simulation modeling.

So, Decision Lab decided to test three computer-based packing techniques comparing mathematical optimization, reinforcement learning, and a rules-based algorithm to identify the most effective one.

Optimization algorithm

For mathematical optimization, Decision Lab specified an objective function and constraints and used a mathematical optimizer to find the solution. The company set the objective function as minimizing unused container space, which also reduces the number of containers used.


Illustration of the Bin Packing Problem
Illustration of the Bin Packing Problem. Source: Decision Lab on Medium

Reinforcement learning and simulation

Reinforcement learning is a machine learning approach in which decisions made in the current state will impact those made in the next. It is applicable in scenarios where context matters. It differs from mathematical optimization, where the optimal solution doesn’t consider the context.

In any reinforcement learning scenario, you will need a state, an action, and a reward function.

An RL agent learns what the optimal action to take for a given state is. When it receives information about the environment (the state), it takes action. Depending on the effect of this action on the environment, the agent gets a reward – either positive or negative. This process is repeated numerous times to maximize the value of the reward.

To train the RL agent, Decision Lab used Microsoft Bonsai. Bonsai integrates well with AnyLogic and helps simulation and subject matter experts without AI-background build, train, and deploy reinforcement learning agents in their projects.


AI training workflow with AnyLogic and Bonsai
AI training workflow with AnyLogic and Bonsai (click to enlarge). Source: Decision Lab on Medium

To solve the Bin Packing Problem, the company developed a simulation model of a conveyor belt which delivered items to the packing area. Decision Lab integrated the model with the Bonsai platform and used it to train the RL agent in packing items in the most efficient way.

One of the policies the RL followed was that an item on the conveyor belt was dealt with on arrival and placed into a container before the RL agent turned to the next item. Also, Decision Lab allowed the agent to see one item ahead, so that it could do some limited planning.

So this is much more challenging than the mathematical optimisation method, where that has complete knowledge of all items to be packed — our RL agent only sees one ahead of a randomly ordered sequence
– Decision Lab.

A conveyro belt with colorful items on it
A simulation of a conveyor belt transporting items to the packing area. Source: Decision Lab on Medium

When all was set, Decision Lab ran experiments to compare optimization and reinforcement learning algorithms with an established rules-based algorithm. The results helped identify which strategy achieved the highest density rate when packing a list of items for limited time.

In this post, we’ve covered two out of three computer-based strategies. Watch the video to learn which of them proved to be the most successful one. Project details and the comparison results:


YouTube video and presentation slides

After learning the project’s results, you’ll know how to tackle the Bin Packing Problem by combining the winning strategy with your Tetris and packing-for-vacation experience.


If you want to go in-depth about simulation modeling utilization for various projects in warehousing, check out our materials about Material Handling Libraries.

Get free materials

Related posts