EuroPython 2024 - Streamlining Testing in a Large Python Codebase

Streamlining Testing in a Large
Python Codebase
Jimmy Lai, Staff Software Engineer, Zip

July 12, 2024

Python Testing: pytest, coverage, and continuous integration01
02
03
04
05
Outline
The Slow Test Challenges
Optimization Strategies
Results
Recap

Zip is the world’s leading
Intake & Procurement
Orchestration Platform
450+ global
customers
$4.4 billion
total customer savings
Top talent from
tech disruptors
$181 million
raised at $1.5 billion valuation

A Large Python Codebase
100 developers
We’re hiring fast

1

A Large Python Codebase
100 developers
We’re hiring fast
2.5 million lines of
Python code
Doubling every year
1 2

Scaling Challenges
100 developers
We’re hiring
2.5 million lines of
Python code
Doubling every year
1 2
Number of tests and
tech debt increase
fast
3

Why Tests?
Quality Assurance
1

Why Tests?
Quality Assurance Refactoring Conﬁdence
1 2

Why Tests?
Quality Assurance Refactoring Conﬁdence Documentation
1 2 3

Test Execution Time01
02
03
Useful Test Metrics
Test Reliability
Test Coverage

Simple Testing using pytest
https://pypi.org/project/pytest/
# in helper.py
def is_even(number: int) -> bool:
if number % 2 == 0:
return True
else:
return False

Simple Testing using pytest
https://pypi.org/project/pytest/
# in helper.py
def is_even(number: int) -> bool:
if number % 2 == 0:
return True
else:
return False

# in test_helper.py
from helper import is_even

def test_is_even_with_even_number():
assert is_even(4) == True

def test_is_even_with_zero():
assert is_even(0) == True

Simple Testing using pytest
https://pypi.org/project/pytest/
# in helper.py
def is_even(number: int) -> bool:
if number % 2 == 0:
return True
else:
return False

# in test_helper.py
from helper import is_even

def test_is_even_with_even_number():
assert is_even(4) == True

def test_is_even_with_zero():
assert is_even(0) == True

> pytest . -vv

======= test session starts =======
collected 2 items

test_helper.py::test_is_even_with_even_number PASSED
test_helper.py::test_is_even_with_zero PASSED

======= 2 passed in 0.03s =======

Simple Testing using pytest
https://pypi.org/project/pytest/
# in helper.py
def is_even(number: int) -> bool:
if number % 2 == 0:
return True
else:
return False

# in test_helper.py
from helper import is_even

def test_is_even_with_even_number():
assert is_even(4) == True

def test_is_even_with_zero():
assert is_even(0) == True

> pytest . -vv

======= test session starts =======
collected 2 items

test_helper.py::test_is_even_with_even_number PASSED
test_helper.py::test_is_even_with_zero PASSED

======= 2 passed in 0.03s =======

Test Execution TimeTest Reliability

Measure Test Coverage
> pytest --cov . -vv

======= test session starts =======
collected 2 items

test_helper.py::test_is_even_with_even_number PASSED
test_helper.py::test_is_even_with_zero PASSED

------------- coverage -------------
Name Stmts Miss Cover
------------------------------------
helper.py 5 1 80%
test_helper.py 6 0 100%
------------------------------------
TOTAL 11 1 91%

======= 2 passed in 0.03s =======

https://pypi.org/project/pytest-cov/
Test Coverage

Measure Test Coverage
> pytest --cov . -vv

======= test session starts =======
collected 2 items

test_helper.py::test_is_even_with_even_number PASSED
test_helper.py::test_is_even_with_zero PASSED

------------- coverage -------------
Name Stmts Miss Cover
------------------------------------
helper.py 5 1 80%
test_helper.py 6 0 100%
------------------------------------
TOTAL 11 1 91%

======= 2 passed in 0.03s =======

To increase the test coverage: add a new test case
for odd numbers
https://pypi.org/project/pytest-cov/
Test Coverage

Continuous Integration
Practice: continuously merge changes into the shared codebase
while ensuring the quality

Continuous Integration
Practice: continuously merge changes into the shared codebase
while ensuring the quality

●Developers submit a pull request (PR) for code review

Continuous Integration
Practice: continuously merge changes into the shared codebase
while ensuring the quality

●Developers submit a pull request (PR) for code review
●Run tests to verify the code changes

Continuous Integration
Practice: continuous merge changes into the shared codebase

●Developers submit a pull request (PR) for code review
●Run tests to verify the code changes
●Merge a PR after all tests passed and approved

Continuous Integration
Practice: continuously merge changes into the shared codebase
while ensuring the quality

●Developers submit a pull request (PR) for code review
●Run tests to verify the code changes
●Merge a PR after all tests passed and approved
Ensure that test reliability and test coverage meet the required
thresholds

Continuous Integration using Github Workﬂows
# File: .github/workflows/ci.yml
name: CI

on:
pull_request: # on updating a pull request
branches:
- main
push: # on merging to the main branch
branches:
- main
https://docs.github.com/en/actions/using-workflows

Continuous Integration using Github Workﬂows
jobs:
build:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.13'
- run: pip install -r requirements.txt
- run: pytest
# File: .github/workflows/ci.yml
name: CI

on:
pull_request: # on updating a pull request
branches:
- main
push: # on merging to the main branch
branches:
- main
https://docs.github.com/en/actions/using-workflows

Challenge: Test Execution Time Increases Over Time
Number of tests
increases
1
Pain Point:
Long Test Execution Time

Challenge: Test Execution Time Increases Over Time
Number of tests
increases
Codebase size
increases
1 2
Pain Point:
Test Coverage OverheadPain Point:
Long Test Execution Time

Challenge: Test Execution Time Increases Over Time
Number of tests
increases
Codebase size
increases
Number of
dependencies increases

1 2 3
requirements.txt
Pain Point:
Test Coverage Overhead Pain Point: Slow Test StartupPain Point:
Long Test Execution Time

?????? Strategy #1: Parallel Execution

Run Tests in Parallel on multiple CPUs
https://pypi.org/project/pytest-xdist/
pytest -n 8 # use 8 worker processes

# use all available CPU cores
pytest -n auto

Run Tests in Parallel on multiple CPUs
https://pypi.org/project/pytest-xdist/
pytest -n 8 # use 8 worker processes

# use all available CPU cores
pytest -n auto
N: number of CPUs (e.g. 8 cores)
Test Execution Time ÷ N

10,000 tests ÷ N is still slow

Run Tests in Parallel on multiple Runners
https://pypi.org/project/pytest-split/
# Split tests into 10 parts and run the 1st part
pytest --splits 10 --group 1

Run Tests in Parallel on multiple Runners
https://pypi.org/project/pytest-split/
# Split tests into 10 parts and run the 1st part
pytest --splits 10 --group 1

N: number of CPUs
Test Execution Time ÷ N
M: number of runners

10,000 tests ÷ N ÷ M

Run Tests in Parallel on multiple Runners
https://pypi.org/project/pytest-split/
# Split tests to 10 parts and run the 1st part
pytest --splits 10 --group 1

# Assumption: All tests have the same
# test execution time.
# Unbalanced test execution time can lead to
# unbalanced Runner durations

N: number of CPUs
Test Execution Time ÷ N
M: number of runners

10,000 tests ÷ N ÷ M

Run Tests in Parallel on multiple Runners
https://pypi.org/project/pytest-split/
# Split tests to 10 parts and run the 1st part
pytest --splits 10 --group 1

# Assumption: All tests have the same
# test execution time.
# Unbalanced test execution time can lead to
# unbalanced Runner durations

# To collect test execution time
pytest --store-durations

# To use the collected time
pytest --splits 10 --group 1 --durations-path
.test_durations
N: number of CPUs
Test Execution Time ÷ N
M: number of runners

10,000 tests ÷ N ÷ M

Use Multi-Runners and Multi-CPUs in a Github Workﬂow
python-test-matrix:
runs-on: ubuntu-latest-8-cores # needs larger runner configuration
strategy:
fail-fast: false # to collect all failed tests
matrix:
group: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
steps:
- run: pytest -n auto -split 10 --group ${{ matrix.group }} ...

https://docs.github.com/en/actions/using-workflows

python-test-matrix:
runs-on: ubuntu-latest-8-cores # needs larger runner configuration
strategy:
fail-fast: false # to collect all failed tests
matrix:
group: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
steps:
- run: pytest -n auto -split 10 --group ${{ matrix.group }} ...

Use Multi-Runners and Multi-CPUs in a Github Workﬂow
https://docs.github.com/en/actions/using-workflows
10 x 8 = 80 concurrent test worker processes

?????? Strategy #2: Cache

Cache Python Dependency Installation
pip install -r requirements.txt
# resolve dependency versions
# download and install dependencies

Cache Python Dependency Installation
pip install -r requirements.txt
# resolve dependency versions
# download and install dependencies

# In Github Workflow
steps:
- uses: actions/cache@v3
id: dependency-cache
with:
key: ${{ hashFiles('requirements.txt') }}

- if: steps.dependency-cache.outputs.cache-hit != 'true'
run: pip install -r requirements.txt

Cache Python Dependency Installation
pip install -r requirements.txt
# resolve dependency versions
# download and install dependencies

# In Github Workflow
steps:
- uses: actions/cache@v3
id: dependency-cache
with:
key: ${{ hashFiles('requirements.txt') }}

- if: steps.dependency-cache.outputs.cache-hit != 'true'
run: pip install -r requirements.txt

Save 5-10 minutes on each CI run in a large
codebase

Cache Python Dependency Installation
pip install -r requirements.txt
# resolve dependency versions
# download and install dependencies

# In Github Workflow
steps:
- uses: actions/cache@v3
id: dependency-cache
with:
key: ${{ hashFiles('requirements.txt') }}

- if: steps.dependency-cache.outputs.cache-hit != 'true'
run: uv pip install -r requirements.txt --system

Save 5-10 minutes on each CI run in a large
codebase

Use uv to install faster
https://pypi.org/project/uv/

Cache Non-Python Dependency Installation
Common non-Python dependencies:
●Python and Node interpreters
●Database: Postgres
●System packages: protobuf-compiler, graphviz, etc.
●Browsers for end-to-end tests: Playwright

Cache Non-Python Dependency Installation
Common non-Python dependencies:
●Python and Node interpreters
●Database: Postgres
●System packages: protobuf-compiler, graphviz, etc.
●Browsers for end-to-end tests: Playwright

# Dockerfile
FROM … # a base image
RUN sudo apt-get install -y postgresql-16 protobuf-compiler

Cache Non-Python Dependency Installation
Common non-Python dependencies:
●Python and Node interpreters
●Database: Postgres
●System packages: protobuf-compiler, graphviz, etc.
●Browsers for end-to-end tests: Playwright

# Dockerfile
FROM … # a base image
RUN sudo apt-get install -y postgresql-16 protobuf-compiler
# After publishing the image
# to a registry

# Github Workflow
Jobs:
run-in-container:
runs-on:ubuntu-latest
container:
image: …

Cache Non-Python Dependency Installation
Common non-Python dependencies:
●Python and Node interpreters
●Database: Postgres
●System packages: protobuf-compiler, graphviz, etc.
●Browsers for end-to-end tests: Playwright

# Dockerfile
FROM … # a base image
RUN sudo apt-get install -y postgresql-16 protobuf-compiler
Save 10 minutes or more on each CI run
in a large codebase
https://docs.github.com/en/actions/using-jobs/running-jobs-in-a-container
# After publishing the image
# to a registry

# Github Workflow
Jobs:
run-in-container:
runs-on:ubuntu-latest
container:
image: …

?????? Strategy #3: Skip Unnecessary Computing

Skip Unnecessary Tests and Linters
Only run specific tests when only specific code are changed

https://github.com/marketplace/actions/changed-files

Skip Unnecessary Tests and Linters
Only run specific tests when only specific code are changed
# Github workflow
jobs:
changed-files:
outputs:
has-py-changes: ${{ steps. find-py-changes.outputs .any_changed }}
runs-on: ubuntu-latest
steps: actions/checkout@v4
- uses: tj-actions/changed-files @44
id: find-py-changes
with:
files: **/*.py

https://github.com/marketplace/actions/changed-files

Skip Unnecessary Tests and Linters
Only run specific tests when only specific code are changed
# Github workflow
jobs:
changed-files:
outputs:
has-py-changes: ${{ steps. find-py-changes.outputs .any_changed }}
runs-on: ubuntu-latest
steps: actions/checkout@v4
- uses: tj-actions/changed-files@44
id: find-py-changes
with:
files: **/*.py
run-pytest:
needs: changed-files
if: needs.changed-files.outputs.has-py-changes == 'True'
steps:
- run: pytest

https://github.com/marketplace/actions/changed-files

Only run specific tests when only specific code are changed
# Github workflow
jobs:
changed-files:
outputs:
has-py-changes: ${{ steps. find-py-changes.outputs .any_changed }}
runs-on: ubuntu-latest
steps: actions/checkout@v4
- uses: tj-actions/changed-files@44
id: find-py-changes
with:
files: **/*.py
run-pytest:
needs: changed-files
if: needs.changed-files.outputs.has-py-changes == 'True'
steps:
- run: pytest

Skip Unnecessary Tests and Linters
?????? Can also only runs on updated files in linters
✨ Modularize code and use build systems to run even fewer tests
https://github.com/marketplace/actions/changed-files

Skip Coverage Analysis for Unchanged Files
# pytest --cov by default measures coverage for all files
and it’s slow in a large codebase

# Add --cov=UPDATED_PATH1 --cov=UPDATED_PATH2 … to only
measure the updated files

Skip Coverage Analysis for Unchanged Files
# pytest --cov by default measures coverage for all files
and it’s slow in a large codebase

# Add --cov=UPDATED_PATH1 --cov=UPDATED_PATH2 … to only
measure the updated files
Save 1 minute or more on each CI run in a
large codebase

?????? Strategy #4: Modernize Runners

Use Faster and Cheaper Runners
Use the new generation CPU/MEM to run faster and cheaper
The 3rd-party-hosted runner providers:
●Namespace
●BuildJet
●Actuated
●…

Use self-hosted runners with auto-scaling
https://github.com/actions/actions-runner-controller/
Use Actions Runner Controller to deploy auto-scaling runners using
Kubernetes with custom hardware specifications (e.g. AWS EC2)
5X+ Cost Saving and 2X+ Faster Test Speed compared to Github runners

Rujul Zaparde
Co-Founder and CEO
Continuously optimizing CI test execution time to improve
developer experiences
Results

Rujul Zaparde
Co-Founder and CEO
Continuously optimizing CI test execution time to improve
developer experiences
Results
Increasing test coverage with
beer quality assurance

Recap: ?????? Strategies for Scaling Slow Tests
in a Large Codebase
Parallel Execution01
02
03
04
Cache
Skip Unnecessary Computing
Modernize Runners

Rujul Zaparde
Co-Founder and CEO
Lu Cheng
Co-Founder and CTO
Engineering Blog
hps://engineering.ziphq.com

Job Opportunities
hps://ziphq.com/careers

Thank You!

EuroPython 2024 - Streamlining Testing in a Large Python Codebase

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

EuroPython 2024 - Streamlining Testing in a Large Python Codebase

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Slide 38

Slide 39

Slide 40

Slide 41

Slide 42

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47

Slide 48

Slide 49

Slide 50

Slide 51

Slide 52

Slide 53

Slide 54

Slide 55

Slide 56

Slide 57

Slide 58

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

8-top-ai-courses-for-customer-support-representatives-in-2025.pptx

7-essential-ai-courses-for-call-center-supervisors-in-2025.pptx

25-essential-ai-courses-for-user-support-specialists-in-2025.pptx

8-essential-ai-courses-for-insurance-customer-service-representatives-in-2025.pptx

Know for Certain

PPT OPD LES 3ertt4t4tqqqe23e3e3rq2qq232.pptx