Background: The OptimAgentProject
•Focus on public health decision making
•Large-scale model (full German population, ~84m)
•Project team consisting of 14 universities and Institutions
•Goal: Enable simulation-based evaluation of NPI-effectiveness*
in heterogeneous populations and provide estimations for
associated economic-, social-, and health costs.
*Non-pharmaceutical interventions
OptimAgentCollaborators
SP4
Population Model
SP2
Behavioral
Adaptation
SP5
Simulation
(GEMS)
SP6
Policy Making
SP3
Epidemiologic
Parametrization
SP1
Contact
Parametrization
6
The German Epidemic Microsimulation System
+ Contact Structures
+ Behavioral Adaptation
+ Intervention Planning
+ Economic Aspects
+ Open Julia Package
+ High Performance
+ 80m Population Model
Written in
Simulating one year of
infection dynamics with
80 million agents takes
roughly 30 minutes*
*On a 24-core server with 128GB Memory
The German Epidemic Microsimulation System
Population Model
Contact Model
Disease Model
Scenario Model
With the complexityand sizeof GEMS, runtime is one of THEcritical success factors.
Therefore, we started thinking about how to build an efficientmodel early on.
But How to Make Code Fast?
Do as few calculations as possible
Do the remaining ones in parallel
S
1S
2S
3S
4S
5
Running multiple concurrent
instances of the simulation
Simulation
”True” parallelized
simulations
Parallelcomputing:
Carrying out calculations simultaneously
Prerequisite:
Causal independence of individual tasks
But How to Make Code Fast?
S
1S
2S
3S
4S
5
Running multiple concurrent
instances of the simulation
S
1
”True” parallelized
simulations
Natural causal independence
Individual runs cannot be sped up
Scalaberuntime. More cores = faster execution
Implementation is complex
We decided to
try building a
parallel ABM
Two Views
Conceptual View Technical View
How to designthe model How to codethe model
This you do
throughout the
project
This you do
before writing a
single line of code
Building a model first and go “we‘ll make it fast later…” is a common pitfall
https://www.amazon.com/Have-Baby-Month-Nine-Women/dp/B096TTV3F9
If parallelization is part of
your “making-it-fast”-scheme
you need to organize your
problem (model) in a way
that can benefit from
distributing workloads
Conceptual View Technical View
How to designthe model How to codethe model
A thought-through conceptual design saves you a lot of headaches later on
The First Thing We Did Back in 2022
Two-day workshop with the OptimAgentconsortium
(infectious disease modelers, physicians, policy-makers, psychologists,
computer scientists, health economists, physicists, mathematicians,…)
Main question: What featuresdoes such an ABM for
public health decision-support need?
Collecting Requirements
Based on the workshop, we
derived 113 required features of
the framework across 23
categories grouped by:
•ProjectFeatures
•SystemFeatures
•ModelFeatures
•SimulationFeatures
We Then Did This For the Next Six Months…
Turns Out, Building a Fast Model of That Size and Complexity is Hard
•After weeks, we produced four variants that we considered viable
•However, they all had pros and cons and were mutually exclusive
•We struggled to agree on one of the options and decided to develop an evaluation framework to objectify
the choice for a parallelization strategy and the implications for the conceptual model along seven criteria:
Single-core
Performance
Load
Balancing
Memory
Efficiency
Model
Flexibility
Synchronization
Overhead
Scalability
Implementation
Complexity
Ponge, Johannes, et al. "Evaluating Parallelization Strategies for Large-Scale Individual-based Infectious Disease Simulations." 2023 Winter Simulation Conference (WSC). IEEE, 2023.
Some „Preflight“ Considerations We Initially Agreed On
•Simulating individual mobility trajectories requires extensive computational resources
•Infections are driven by contacts, not (necessarily) by movement
•Covid infections were 18 times more likely to happen indoors
•We simulate contacts in so-called settings(e.g.households or workplaces) and disregard physical mobility
•Individuals can be assigned to mobility hubs (e.g.airports) to account for “long-distance infections”
•Our model is an individual-based microsimulation with a superimposed network structure
03 | Infect Individual 5
04 | Infect Individual 217
Discrete Event Simulation
Event Queue
Time Action
2
3
1
7
10
8
9
5
6
4
07 | Infect Individual 3592
10 | Recover Individual 9
Enqueue new events
Very efficient; only simulating infectious individuals
Hard to parallelize; events must be processed in order
2
3
1
7
10
8
9
5
6
4
→Process events in a “safe” time window ??????
??????
Parallelization Strategies | Conservative Discrete Event Simulation (CDES)
Event Queue 1 Event Queue 2
A
Event Queue 3
B
C
D
→Segment the model „geographically“
→Give each segment its own event queue
→Threads exchange events through messages
Thread 2Thread 1 Thread 3
D
C
Parallelization Strategies | Conservative Discrete Event Simulation (CDES)
Event Queue 1 Event Queue 3
B
Event Queue 2
→Process events in a “safe” time window ??????
??????
→Segment the model „geographically“
→Give each segment its own event queue
→Threads exchange events through messages
2
3
1
7
10
8
9
5
6
4
Thread 2Thread 1 Thread 3
Parallelization with few to none conflicts as threads wait for each other
Load-balancing might be a serious issue
2
3
1
7
10
8
9
5
6
4
Parallelization Strategies | Optimistic Discrete Event Simulation (ODES)
Event Queue 1 Event Queue 2
A
Event Queue 3
B
C
D
→Give each section its own “history stack”
Thread 2Thread 1 Thread 3
History Stack 1 History Stack 2 History Stack 3
2
3
1
7
10
8
9
5
6
4
Parallelization Strategies | Optimistic Discrete Event Simulation (ODES)
Event Queue 1 Event Queue 2
A
Event Queue 3
B
C
D
→Give each section its own “history stack”
→Store events after processing
Thread 2Thread 1 Thread 3
History Stack 1 History Stack 2 History Stack 3
→Message events as they appear; no time-sync
2
3
1
7
10
8
9
5
6
4
Parallelization Strategies | Optimistic Discrete Event Simulation (ODES)
Event Queue 1 Event Queue 2
A
Event Queue 3
B
C
D
→Give each section its own “history stack”
→Store events after processing
Thread 2Thread 1 Thread 3
History Stack 1 History Stack 2 History Stack 3
→Message events as they appear; no time-sync
→“Rollback“ events to resolve conflicts
ROLLBACK
No idling threads
Expensive rollbacks; requires large memory
2
3
1
7
10
8
9
5
6
4
Parallelization Strategies | Generative Turns Fixed-Time Intervals (GTFTI)
→Tick-based time model (e.g. 1 tick = 1 day)
??????
??????
→Loop over all (susceptible) individuals in parallel
→Evaluate their infection risk based on their contacts
→Update their state in a new generation of the model
Near perfect parallelization
Many irrelevant calculations;
requires twice as much memory
2
3
1
7
10
8
9
5
6
4
??????
??????+1
Parallelization Strategies | Active Settings in a Fixed-Time-Interval (ASFTI)
Household
Layer
Workplace
Layer
School
Layer
→Tick-based time model (e.g. 1 tick = 1 day)
→Partition model by “setting” type
→Marking settings with infectious individuals “active”
→Calculate infections (in parallel) for active settings
Much fewer irrelevant calculations
Inducesmodel assumptions on
disjunctive nature of settings
The Active Setting approach allows us to “push” infections (instead of “pulling” them) while still being able to parallelize theinfection procedure. We discuss these ideas in:
Ponge, Johannes, et al. "Evaluating Parallelization Strategies for Large-Scale Individual-based Infectious Disease Simulations." 2023 Winter Simulation Conference (WSC). IEEE, 2023.
Results
Criteria/
Strategy
Sg.-Core
Perf.
Load
Balanc.
Memory
Eff.
Model
Flex.
Sync.
Overh.
Scala-
bility
Implem.
Comp.
Sum
CDES 3 1 3 3 2 2 2 16
ODES 3 1 1 3 1 2 1 12
GTFTI 1 3 1 2 3 3 3 16
ASFTI 2 3 2 2 2 3 2 16
Sum 9 8 7 10 8 10 8
No strategy that
„does it all“
The Active Settings in a Fixed-
Time-Interval (ASFTI) approach is
the only strategy without a “non-
acceptable” ratingin any category
1 point (non-acceptable losses*) to 3 points (little to no loss*)
*compared to a non-parallel implementation
The presence of an
infectious individual
“activates” a setting
Settings without
infectious individuals are
considered “inactive”; To
improve performance, we
do not simulate them
We Eventually Adopted the Active-Setting Approach for GEMS
Infection Procedure
Iterate over all active
settings
Iterate over all
infected individuals
(in that setting)
Draw contacts from
setting-dependent
contact structure
Evaluate contact
infection probability
Evaluate setting
infection probability
Conceptual View Technical View
How to designthe model How to codethe model
Choice of Programing Language
•The choice of programming language is
probably one of the most important when
it comes to runtime.
•While languages like R and Python are the
most common candidates (There’s a Stack
Overflow article for literally any problem
you might have), they are not particularly
“fast”.
•We decided to write GEMS in Julia.
Gao, Kaifeng, et al. "Julia language in machine learning: Algorithms, applications, and open issues." Computer Science Review37(2020): 100254.
Thinking Vectors
Example: You have a dataframewhich holds the ages and infection statuses of
10 million agents. You now want to extract the ages of infected individuals.
df=DataFrame(
age =rand(1:100, 10_000_000),
infected =rand(Bool, 10_000_000))
res =[]
foriin1:nrow(df) # iterate through dataframe
age =df[i,1]
inf =df[i,2]
ifinf ==true
push!(res, age) # extract age to result array, if infected
end
end
Naiveapproach:
Benchmark: 5.103593 seconds
res =filter(row ->row.infected==true, df).age;
Thinking Vectors
Betterapproach:
Using inbuilt methods
(such as filter) which are
often runtime optimized
Benchmark: 1.617579 seconds
res = df.age[df.infected]
Bestapproach:
Using pure vector
operations
Benchmark: 0.051579 seconds
Vectorization
•Modern CPUs can handle vectorizedinstructions.
•This is an effective way of achieving parallelismeven with
a single-core machine.
•However, your code must be organizedin way that can
benefit from vectorization.
https://www.dornerworks.com/blog/write-vectorized-code-and-optimize-your-cpu-performance/
Object Oriented Implementation vs. Vector-Based Implementation
“For further efficiency, agents are not
represented as individual objects, but rather
as indices of one-dimensional state arrays.
This avoids the need to use an explicit for-
loop over each agent on every integration
timestep, increasing performance by more
than an order of magnitude.”
-Kerr et al. 2021
Kerr, Cliff C., et al. "Covasim: an agent-based model of COVID-19 dynamics and interventions."PLOS Computational Biology17.7 (2021): e1009149.
Object Oriented Implementation vs. Vector-Based Implementation
Kerr, Cliff C., et al. "Covasim: an agent-based model of COVID-19 dynamics and interventions."PLOS Computational Biology17.7 (2021): e1009149.
This implementation only works if you have
no nested objects of arbitrary length
Let’s say, you want to store the
agents’ vaccination history:
Person A
-age: 65
- vaccinations
- day 10
- day 46
- day 205
Person B
-age: 17
- vaccinations
- day 35
- day 68
Person C
-age: 6
- vaccinations
How to “flatten” these
nested objects into
state-arrays?
Preventing Bottlenecks During Parallelization
Example: You have a vector of agent ages and want
to count the number of people younger than 20.
ages =rand(1:100, 10_000_000)
cnt=0
foriinages
ifi<=20
cnt+=1
end
end
Naiveapproach:
Benchmark: 0.957924 seconds
Preventing Bottlenecks During Parallelization
cnt=[0]
rlock=ReentrantLock()
Threads.@threadsforiinages
ifi<=20
lock(rlock) do
cnt[1] =cnt[1] +1
end
end
end
Parallelapproach:
Benchmark: 1.098423 seconds
Add a “lock” to prevent
multiple threads (cores)
accessing the same variable
simultaneously
Parallelize the loop
Preventing Bottlenecks During Parallelization
cnt=zeros(Int, Threads.nthreads())
Threads.@threadsforiinages
ifi<=20
cnt[Threads.threadid()] +=1
end
end
sum(cnt)
Bottleneck-freeapproach:
Benchmark: 0.133476 seconds
Instantiate a vector of
zerosfor the number of
threads (cores) you have
Let each thread use
its own counter
Finally, sum up all
counters
Using Profilers
Profilers visualizehow much the
individual parts of your code
contribute to the overall runtime.
It also visualizes where runtime is
caused by “fixable” problems (red
and yellow areas).
Total Runtime
Infect
Individual
Sample
Contacts
Collect
“Present”
Individuals
Calculate
Disease
Progression
Update an Individual’s Disease State
GEMS Profile (single core)
@profviewrun!(sim)
Book Recommendation
(if you plan to use Julia…)
Avik Sengupta, “Julia High Performance”, PacktPublishing, ISBN: 978-1788298117
Conceptual View Technical View
How to designthe model How to codethe model
Conclusion –Conceptual View
•Get a thorough understanding of what you want to achieve with your model before starting to design it
•Design a fast model rather than design a model and “make-it-fast-later”
•Understand the implicationsof your model architecture choice. Are you “closing” certain doors?
•Exploit characteristics exclusive to infectious disease models (e.g., by using incubation periods as slack)
•Discuss your ideas with as many peersas possible ☺
Conclusion –Technical View
•Think in vectors
•Work with flat data and dataframes
•Use inbuiltfunctions
•Prevent loops
•Prevent bottlenecks
•Prevent memoryallocations
•Use benchmarkingtools
•Use profilers
Main Takeaway
•If you want a fast model,
•Do as few calculations as possible
•Do the remaining ones in parallel