In the middle of the COVID-19 outbreak, simulation models trying to predict its behavior and outcomes are a dime a dozen. Are there any differences between them or is it a “seen one, seen them all” type of deal?
In this series of articles, we present the difficulties of developing a good epidemic model and why it is so hard to achieve, what we believe are the shortcoming of the classic SEIR approach to this problem, and the advantages of an Agent-Based Model. We will also share our own fully parametrized, fully adaptable ABM model for you to use and adjust as you wish as also the results we’ve obtained from it.
Part I will focus on explaining the development of the model, its different components, and the difference with the SEIR approach.
Part II presents the construction of different scenarios and the results we’ve obtained by running Monte Carlo experiments with them.
The model is fully adaptable to any region by changing the input files and can be found both on GitHub and on AnyLogic Cloud for you to use it and comment your feedback for its continuous development.
Special thanks to Gastón, Cecilia, Gabriel, Mauricio and Tatianna who helped analyze and obtain the data and develop the model, Tomás for his visual designs, and to Brian and Agustin for their support and advice.
- Why is it so hard to model the COVID-19 epidemic?
- Advantages of an Agent-Based approach over the SIR Model
- So how do we handle the data?
1. Why is it so hard to model the COVID-19 epidemic?
Struggles of a good COVID-19 Simulation: It’s all about the data (and it’s interdependency)
The first step to understanding any system is asking ourselves what it is we are trying to understand and, as the team of FiveThirtyEight well explains in their very recommended article, understanding COVID-19’s evolution is no easy task.
So, what does COVID-19's evolution mean?
The focus of every epidemic model is to determine the variation of one (or better said, the two) of the following factors:
- Peak Number of Infected People
- Total Number of Deaths
The thing is, these are two sides of the same coin and can’t be studied independently:
The number of infected people will determine the number of deaths from the illness, and this will end up affecting the number of infected people in return.
These two variables are largely determined by two factors:
- Fatality Rate
- Infection Rate
The Case Fatality Rate (not to be confused with mortality rate) is determined by the relation between the total number of deaths caused by a certain illness and the total number of infected people during a certain period.
CFR = number of deaths / number of infected
Right off the bat we encounter our first problem: Data is not always objective. It is, in fact, determined in great measure by the way it was recollected. We will not be able to access the total number of infected people but the total known number of infected people.
CFR = number of deaths / known number of infected
And what’s the difference between the total number and the total known number? The testing and diagnostic policies. What is more, testing policies are heavily influenced by the availability of the system studied for testing and the symptomatic ratio of the illness.
Well, what about the numerator? Total number of deaths, although can be determined with greater fidelity than the total number of infected, is also dependable on other factors.
On one hand, we know that fatality rate is heavily influenced by age (and comorbidities), which will vary between each system analyzed (and within that system’s divisions too!).
On the other hand, fatality rate is also determined by the healthcare attention received during the infection, where a fatal case could be prevented with the adequate services. This will also be determined by the system studied.
See what happened? From having to determine one variable, we now ended up with four. Furthermore, some of these variables can only be determined truthfully through rigorous testing in control groups.
Meanwhile, most of them are heavy dependable on the system under study. This will make results and data recollection vary significantly from one place to another and contributes to the difficulties in understanding the underlying problems.
The problem of data collection biases is not restrictive to the fatality rate but will affect the definition of all the other variables that make up the model such as the infection rate.
By now we must all probably have at least heard of the term R0, that is, the Basic Reproduction Number (pronounced R nought or R zero in case you were wondering).
R0 is an epidemiology term that refers to the expected number of cases directly generated by one case in a population where all individuals are susceptible to infection.
From the get go, we can clearly see that R0 has embedded a bunch of other variables inside its definition. These are:
Furthermore, this variables are all influenced by each other.
The contact rate refers to the number of people the infected person encounters during its sickness.
This will widely differ from person to person for a variety of reasons: from population density and how is their household conformed, to their social lifestyle and commuting methods. Prevention methods such as facemask will also affect a person’s contact rate.
This can be thought as the exposure the virus receives.
The transmission rate refers to the number of people the sick person will actually infect from the ones he or she encounters.
This can be considered as the probability of being infected if you have been exposed to the virus.
Finally, the sickness duration determines how long the other two factors will come into play. This will vary from person to person according to the severity of the illness and their own immune system’s response (though it still hasn’t been shown that medical attention reduces the sickness’ timespan).
At least until a cure for COVID-19 is developed, the only one of these factors we can somehow affect is the contact rate whilst the other two depend on the virus and the exposed or infected person’s immune responses.
This is why most non-pharmaceutical interventions are focused on affecting this variable in particular.
Let’s go back to the R0 definition we mentioned before:
R0 is an epidemiology term that refers to the expected number of cases directly generated by one case in a population where all individuals are susceptible to infection.
Notice the phrase ‘where all individuals are susceptible to infection’.
For instance, if there weren’t any more susceptible people to be infected, the reproduction number would naturally decrease. Furthermore, we’ve seen how contact rate regulations would also change its value.
The number of infections per infected person is not a fixed or static number, but rather an ever changing variable.
We’ll call this the Effective Reproduction Number or Re (also called just R or Rt) and, different to R0, which could only be estimated through calculations, we can somehow try to measure Re.
But again we now find ourselves immersed in the context of the data collection biases that naturally plague this problem.
Well, now we know what we are dealing with, how do we go about modelling it?
2. How should we approach this?
Why the classic SIR Model doesn’t seem to cut the mustard: Agent-Based Modelling as more faithful representation of the problem
After all the difficulties we’ve seen, one must be asking themselves how are all these different COVID-19 models being made then?
The vast majority of the coronavirus simulators that have arisen in the past couple of weeks are based on what is known as the SEIR Model. This stands for:
The SEIR Model — Overview and shortfalls of the classic model when it comes to analysing real cases
This model is based in System Dynamics and as such is composed by a series of stocks that represent different populations and flows that generate the variations between them.
In its classical form, the system consists of 4 different stocks that give name to the model and represent it’s current state for any given point in time:
The dynamics of this model are characterized by a set of four ordinary differential equations that correspond to the stages of the progression for each of the four populations:
What are the advantages that make the SEIR model so appealing?
The rise in popularity of the SEIR model comes not without a reason. Credit where credit is due, we’ve compiled this reasons into two main categories:
It’s luring simplicity — K.I.S.S. (Keep it Simple, Stupid!)
There’s something undeniably beautiful that comes with simplicity: just four stocks, just three flows. It’s easy to adapt, easy to build and easy to operate, hence its numerous proliferation.
Also, it’s easy to complexify. The SEIR model with four stocks is the classic approach but there are an infinite amount of different versions and adaptations. This is a great advantage to try different tweaks, ideas and hypotheses fast. The problem is that take it too far and it totally defeats the purpose of having a simple model whilst lacking the required complexity for a more heavy analysis.
Its charming personality — Easy to read, what you see is what you get.
The SEIR model does not speak in riddles: what he shows you is all he knows, doesn’t matter how many times you ask.
Using confidence levels to compare means is no way to make conversation in your elevator rides and the SEIR model knows that. That’s why he won’t pester you with that stuff and cut to the chase with the number you want to know. Easy to remember, easy to compare, easy to explain.
Top this with not requiring to run a Monte Carlo experiment (at least in most cases) and you‘ve got the formula for the most popular model of the whole school.
So, what’s the problem with this model?
Nothing comes without a cost, and SEIR’s simplicity and its approachability are no exception.
To keep consistency and help you ponder your own conclusions on the matter, we’ve also broken down the answer to this question into two main groups:
Playing with a loaded dice — Based on a deterministic approach
What happened with all subtleties and variations we talked about before? Did they just disappear?
The answer is yes… but not really.
The SEIR Model (in most cases) assumes a specific value for its variables. As seen before, this variables are conformed by a variety of other factors, and by assuming their value we are taking away the effects of their interactions.
By doing this you have stripped the problem from all the variations and randomness it has by nature.
But what if we don’t want to see just one possible outcome to the problem? What if we wanted to see the combination of all the different possible outcomes? Well, with a deterministic SEIR Model that’s just not possible.
This transforms the simulation into nothing more than calculator, where for a given input you’ll always obtain the same answer (and talking about epidemic calculators, check out this beautiful one right here).
Coarser than sandpaper — A top-down analysis
The other limitation the model has comes from its coarse-grain nature.
By having populations as the base of the model, the outputs and insights obtained are also restrained to population level.
Much of the variability embedded in the system comes from the variation found in the individuals that conform said populations.
For example, we’ve talked about how age and health services received during the illness affect an individual’s fatality rate. If working at population level these interactions are lost, as your smallest unit of analysis (and therefore of interaction in the model) will also be at population level.
With such a coarse granularity we can’t represent the interactions of finer variables and their impact on the system and end up working solely with averages.
Additionally, we are unable to integrate specifically targeted policies or insights to the model since it is based on populations and not individuals.
Agent-Based Modeling | What is it and how can it be applied to simulate our problem
After seeing the good and the bad, it’s about time we get heads-on into the mud to meet the ugly: the Agent-Based Model.
In stark difference to its more popular cousin, the agent-based approach lacks its charm and simplicity but compensates with versatility and sturdiness.
But no worries, we’ve done the heavy cleaning for you already (and such as the model is open to you, so are we open to your feedback, criticism and contributions — because good work is never finished)
How can we avoid falling back into the same traps that tormented the SEIR model?
The answer is building a model the other way around:
Instead of using whole populations as our building blocks, and therefore as our base of analysis, we should use the infected person as our construction unit (with all the necessary variations this implies).
With this we’ve generated a stochastic approach through a bottom-up analysis of the system.
This type of simulation, where different entities interact with each other impacting on the system as a whole, is what is known as
Here, each agent that conforms the system is characterized by a series of states. An agent will change states through different transitions that can be triggered by certain episodes, as meeting another agent while at a given state.
In ABM, the interaction of different entities with each other can trigger events and state transitions that impact on the system as a whole.
The different states the agent goes through since its creation can be represented by what are known as state charts. These charts show us not only all the possible states, but also their hierarchy, the transitions between them, and how they are triggered.
Our agent represents an infected person and it can be characterized by two mayor state charts:
- The first one represents the agent’s transition through the different stages of the sickness. We’ll call this the Sickness Statechart
- On the other hand, we have the agent’s relation to the healthcare system and their isolation status. This is the Healthcare Statechart
The different stages of these two state charts will trigger transitions between them and determine their effects on other agents they encounter as well as itself.
Sickness State Chart
The sickness evolution starts when a person is infected. It begins in the asymptomatic phase that corresponds to the sickness’ incubation period. After the incubation period, the illness will develop symptoms and thus enter the symptomatic phase.
From the symptomatic ratio, we can determine the amount of cases that will develop symptoms and the ones which won't.
Those who don’t develop symptoms proceed to the recovered state and end their sickness evolution. These occasions correspond to asymptomatic cases, where the infected agent transitioned the illness completely with no visible symptoms.
The symptomatic cases are categorized by three different states corresponding to the intensity of the illness, starting from mild up to severe.
Finally, we have our two remaining transitions. Both moderate and severe cases present the risk of not recovering and therefore transitioning to the deceased state.
On the other hand, the infected can recover and transition from the symptomatic state into the recovered state.
As long as the agent is in one of the infected states, it’ll have the chance to create another infected agent from its encounters with another people. This chance of spreading the illness is also affected by which state of the sickness evolution the agent is in (for example, an asymptomatic infected agent is less bound to infect new people than a moderate symptomatic one) and their isolation status, which brings us to the Healthcare State chart.
Healthcare State Chart
In the healthcare chart we have two major states that refer to the isolation status of the agent: Free Roaming and Isolated.
The transition from free roaming to isolated will be determined by testing policies and non-pharmaceutical interventions, as well as the agent’s location in the sickness state chart. For example, although it has not been tested, an agent that finds himself in a severe symptomatic state will look for medical attention nonetheless and therefore isolate.
The isolated population is comprised of two inner states that refer to the level of healthcare attention received: Non-Hospitalized and Hospitalized.
While the sickness state chart determines the agent’s healthcare attention requirements (depending on the severity of the illness), the healthcare state chart is the one that finally determines the healthcare attention received depending on the region’s policies and availability.
Once inside the hospitalized cases, we have the distinction between the type of healthcare attention the infected person is receiving.
As mentioned, these transitions will depend on an agent’s particular needs (associated with the severity of the illness) and the system’s availability.
Similarly, inside the intensive care patients we can find those with and without ventilator.
Finally, we have the transitions to the Deceased and Recovered states.
If we put these two state charts together, we have a representation of the agent’s transition through the whole illness, starting with its infection and finishing in its recovery or demise.
There you have it, seems the ugly wasn’t so bad after all.
3. So how do we handle data?
Main inputs and assumptions and their application
So now we’ve met our protagonists, how do we managed to handle the data?
Because yes, although ABM gives us a better understanding of our problem it doesn’t actually solve the problems stated in section one. For this, we’ve divided the required inputs into five categories:
- Healthcare Capacity
- Sickness Characteristics
- Initial State
The information used (along its sources) can be found inside the input file of the repository. While making this we tried to be as open and as clear as possible stating what is used and how, and what are the assumptions made in order to let you decide which ones you’ll accept and which you’d rather modify.
Alongside this, the model is fully parametrized: nothing is hardcoded. By doing this, not only can you adapt the model to the region you’d like, but also change the data used, say for example, in the sickness characteristics, for your own sources and information.
The Regions input consists of the number of regions, their population, area (and therefore population density) and name.
In a (not so distant) future update, industries and their number of employees by age will be added so as to be able to target policies to specific jobs sectors.
The age structure of each region is also loaded to the model.
Their are two main type of movements in the model: external movements comprised of incoming people from the rest of the world, and interregional movements.
1. External Movements
The external movements are generated by the daily average of incoming persons and the destination probability of each of the regions.
This way, every day a certain number of external newcomers are generated and arrive at their destination determined by said probability.
Nonetheless, naturally not all the persons arriving to our systems will be infected. This is determined by another input: the probability of finding infected person in an incoming movement. This value decreases after a certain period of time, as the number of external infected people is bound to decrease.
2. Interregional Movements
The interregional movements are also generated by the daily interregional movements probability. This time, this number is used to generate the probability a person of a given region has of moving.
Once more, we have an origin-destination matrix representing the different probability for each combination.
But how long should said movement last? That’s determined by the average stay for each time of movement and their corresponding probability. The types of movements used are ‘Leisure’, ‘Work’, ‘Family & Friends’ and ‘Other’, but you could add your own type.
With this, each (infected) person, based on their region, will have a daily chance of travelling to another region, a destination probability, and a stay period depending on the movement type.
The healthcare capacity input is comprised by the number of isolation beds, the number of intensive care units, the number of ventilators, and the daily testing capabilities of each region.
Isolation beds were calculated by adding the following bed availability of each region: general purposes, common bed, especial usage, prolonged stay, and undetermined beds. Maternity, neonatology, and paediatric beds have not been taken into account.
For intensive care units, pediatric beds where also not taken into account.
If needed, the possibility of increasing the healthcare capacity at certain date could easily be added to the model.
Until now we’ve seen input that will vary from one system to another: the population, movements and healthcare system will depend on which regions you’re trying to model.
On the other hand, although it has been parametrized so you can tweak it as you desire (or swap it out completely for your own values if that’s what you wish), the sickness’ characteristics should be roughly the same for whatever the regions you decide to model.
This input consists of (the sources for this data can be found in the input file):
- Maximum and minimum basic reproduction number — (R0Max and R0Min)
- Incubation period duration — (distribution)
- Mild case duration — (distribution)
- Severe/moderate case duration — (distribution)
Symptoms and Severity
- Symptomatic rate — (probability distributed by age)
- Moderate evolution probability — (probability distributed by age)
- Severe evolution probability — (probability distributed by age)
- Ventilator requirement — (probability distributed by age)
Lethality and healthcare effect
- Lethality rate for moderate cases — (rate distributed by age)
- Lethality rate for severe cases — (rate distributed by age)
- Effect of healthcare assistance over lethality rates
- Diagnosis time — (distribution)
The last step in setting up the model is deciding when and how it’ll start. For that the user must define a starting date and the initial infected agents per region (for now, these initial agents will start in the asymptomatic stage).
With all this set, we are ready to go!
Thank you for reading! We hope you enjoyed it and that it may be useful.
Did you like the article? Want to see what results you can obtain from a model like this? Stay tuned for sequel and in the meanwhile we encourage you to try the model out for yourself.
Once more, we are open to your comments and feedback and are eager to see what changes you’d make.