PyCon JP 2024 Streamlining Testing in a Large Python Codebase .pdf

jimmy_lai 268 views 61 slides Sep 27, 2024
Slide 1
Slide 1 of 61
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61

About This Presentation

Maintaining code quality in a growing codebase is challenging. We faced issues like increased test suite execution time, slow test startups, and coverage reporting overhead. By leveraging open-source tools, we significantly enhanced testing efficiency. We utilized pytest-xdist for parallel test exec...


Slide Content

Streamlining Testing in a Large
Python Codebase
Jimmy Lai, Staff Software Engineer, Zip

Sept. 28, 2024
https://www.slideshare.net/jimmy_lai/presentations

Python Testing: pytest, coverage, and continuous integration01
02
03
04
05
Outline
The Slow Test Challenges
Optimization Strategies
Results
Recap

Zip is the world’s leading
Intake & Procurement Orchestration Platform

A Large Python Codebase
100 developers
We’re hiring fast

1

A Large Python Codebase
100 developers
We’re hiring fast
2.5 million lines of
Python code
Doubling every year
1 2

Scaling Challenges
100 developers
We’re hiring
2.5 million lines of
Python code
Doubling every year
1 2
Number of tests and
tech debt increase
fast
3

Why Tests?
Quality Assurance
1

Why Tests?
Quality Assurance Refactoring Confidence
1 2

Why Tests?
Quality Assurance Refactoring Confidence Documentation
1 2 3

Test Execution Time01
02
03
Useful Test Metrics
Test Reliability
Test Coverage

Simple Testing using pytest
https://pypi.org/project/pytest/
# in helper.py
def is_even(number: int) -> bool:
if number % 2 == 0:
return True
else:
return False

Simple Testing using pytest
https://pypi.org/project/pytest/
# in helper.py
def is_even(number: int) -> bool:
if number % 2 == 0:
return True
else:
return False

# in test_helper.py
from helper import is_even

def test_is_even_with_even_number():
assert is_even(4) == True


def test_is_even_with_zero():
assert is_even(0) == True

Simple Testing using pytest
https://pypi.org/project/pytest/
# in helper.py
def is_even(number: int) -> bool:
if number % 2 == 0:
return True
else:
return False

# in test_helper.py
from helper import is_even

def test_is_even_with_even_number():
assert is_even(4) == True


def test_is_even_with_zero():
assert is_even(0) == True

> pytest . -vv

======= test session starts =======
collected 2 items

test_helper.py::test_is_even_with_even_number PASSED
test_helper.py::test_is_even_with_zero PASSED

======= 2 passed in 0.03s =======

Simple Testing using pytest
https://pypi.org/project/pytest/
# in helper.py
def is_even(number: int) -> bool:
if number % 2 == 0:
return True
else:
return False

# in test_helper.py
from helper import is_even

def test_is_even_with_even_number():
assert is_even(4) == True


def test_is_even_with_zero():
assert is_even(0) == True

> pytest . -vv

======= test session starts =======
collected 2 items

test_helper.py::test_is_even_with_even_number PASSED
test_helper.py::test_is_even_with_zero PASSED

======= 2 passed in 0.03s =======

Test Execution TimeTest Reliability

Measure Test Coverage
> pytest --cov . -vv

======= test session starts =======
collected 2 items

test_helper.py::test_is_even_with_even_number PASSED
test_helper.py::test_is_even_with_zero PASSED

------------- coverage -------------
Name Stmts Miss Cover
------------------------------------
helper.py 5 1 80%
test_helper.py 6 0 100%
------------------------------------
TOTAL 11 1 91%

======= 2 passed in 0.03s =======

https://pypi.org/project/pytest-cov/
Test Coverage

Measure Test Coverage
> pytest --cov . -vv

======= test session starts =======
collected 2 items

test_helper.py::test_is_even_with_even_number PASSED
test_helper.py::test_is_even_with_zero PASSED

------------- coverage -------------
Name Stmts Miss Cover
------------------------------------
helper.py 5 1 80%
test_helper.py 6 0 100%
------------------------------------
TOTAL 11 1 91%

======= 2 passed in 0.03s =======

To increase the test coverage: add a new test case
for odd numbers
https://pypi.org/project/pytest-cov/
Test Coverage

Continuous Integration
Practice: continuously merge changes into the shared codebase
while ensuring the quality through automated testing

Continuous Integration
Practice: continuously merge changes into the shared codebase
while ensuring the quality through automated testing

●Developers submit a pull request (PR) for code review

Continuous Integration
Practice: continuously merge changes into the shared codebase
while ensuring the quality through automated testing

●Developers submit a pull request (PR) for code review
●Run tests to verify the code changes

Continuous Integration
Practice: continuously merge changes into the shared codebase
while ensuring the quality through automated testing

●Developers submit a pull request (PR) for code review
●Run tests to verify the code changes
●Merge a PR after all tests passed and approved

Continuous Integration
Practice: continuously merge changes into the shared codebase
while ensuring the quality through automated testing

●Developers submit a pull request (PR) for code review
●Run tests to verify the code changes
●Merge a PR after all tests passed and approved
Ensure that test reliability and test coverage meet the required
thresholds

Continuous Integration using Github Workflows
# File: .github/workflows/ci.yml
name: CI

on:
pull_request: # on updating a pull request
branches:
- main
push: # on merging to the main branch
branches:
- main
https://docs.github.com/en/actions/using-workflows

Continuous Integration using Github Workflows
jobs:
run-pytest:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.13'
- run: pip install -r requirements.txt
- run: pytest
# File: .github/workflows/ci.yml
name: CI

on:
pull_request: # on updating a pull request
branches:
- main
push: # on merging to the main branch
branches:
- main
https://docs.github.com/en/actions/using-workflows

Challenge: Test Execution Time Increases Over Time
Number of tests
increases
1
Pain Point:
Long Test Execution Time

Challenge: Test Execution Time Increases Over Time
Number of tests
increases
Codebase size
increases
1 2
Pain Point:
Test Coverage OverheadPain Point:
Long Test Execution Time

Challenge: Test Execution Time Increases Over Time
Number of tests
increases
Codebase size
increases
Number of
dependencies increases


1 2 3
requirements.txt
Pain Point:
Test Coverage Overhead Pain Point: Slow Test StartupPain Point:
Long Test Execution Time

?????? Strategy #1: Parallel Execution

Run Tests in Parallel on multiple CPUs
https://pypi.org/project/pytest-xdist/
pytest -n 8 # use 8 worker processes


# use all available CPU cores
pytest -n auto

Run Tests in Parallel on multiple CPUs
https://pypi.org/project/pytest-xdist/
pytest -n 8 # use 8 worker processes


# use all available CPU cores
pytest -n auto
N: number of CPUs (e.g. 8 cores)
Test Execution Time ÷ N


10,000 tests ÷ N is still slow

Run Tests in Parallel on multiple Runners
https://pypi.org/project/pytest-split/
# Split tests into 10 parts and run the 1st part
pytest --splits 10 --group 1

Run Tests in Parallel on multiple Runners
https://pypi.org/project/pytest-split/
# Split tests into 10 parts and run the 1st part
pytest --splits 10 --group 1


N: number of CPUs
Test Execution Time ÷ N
M: number of runners

10,000 tests ÷ N ÷ M

Run Tests in Parallel on multiple Runners
https://pypi.org/project/pytest-split/
# Split tests into 10 parts and run the 1st part
pytest --splits 10 --group 1

# Assumption: All tests have the same
# test execution time.
# Unbalanced test execution time can lead to
# unbalanced Runner durations



N: number of CPUs
Test Execution Time ÷ N
M: number of runners

10,000 tests ÷ N ÷ M

Run Tests in Parallel on multiple Runners
https://pypi.org/project/pytest-split/
# Split tests into 10 parts and run the 1st part
pytest --splits 10 --group 1

# Assumption: All tests have the same
# test execution time.
# Unbalanced test execution time can lead to
# unbalanced Runner durations


# To collect test execution time
pytest --store-durations

# To use the collected time
pytest --splits 10 --group 1 --durations-path
.test_durations
N: number of CPUs
Test Execution Time ÷ N
M: number of runners

10,000 tests ÷ N ÷ M

Use Multi-Runners and Multi-CPUs in a Github Workflow
python-test-matrix:
runs-on: ubuntu-latest-8-cores # needs larger runner configuration
strategy:
fail-fast: false # to collect all failed tests
matrix:
group: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
steps:
- run: pytest -n auto -split 10 --group ${{ matrix.group }} ...

https://docs.github.com/en/actions/using-workflows

python-test-matrix:
runs-on: ubuntu-latest-8-cores # needs larger runner configuration
strategy:
fail-fast: false # to collect all failed tests
matrix:
group: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
steps:
- run: pytest -n auto -split 10 --group ${{ matrix.group }} ...

Use Multi-Runners and Multi-CPUs in a Github Workflow
https://docs.github.com/en/actions/using-workflows
10 x 8 = 80 concurrent test worker processes

?????? Strategy #2: Cache

Cache Python Dependency Installation
pip install -r requirements.txt
# resolve dependency versions
# download and install dependencies

Cache Python Dependency Installation
pip install -r requirements.txt
# resolve dependency versions
# download and install dependencies

# In Github Workflow
steps:
- uses: actions/cache@v3
id: dependency-cache
with:
key: ${{ hashFiles('requirements.txt') }}

- if: steps.dependency-cache.outputs.cache-hit != 'true'
run: pip install -r requirements.txt

Cache Python Dependency Installation
pip install -r requirements.txt
# resolve dependency versions
# download and install dependencies

# In Github Workflow
steps:
- uses: actions/cache@v3
id: dependency-cache
with:
key: ${{ hashFiles('requirements.txt') }}

- if: steps.dependency-cache.outputs.cache-hit != 'true'
run: pip install -r requirements.txt


Save 5-10 minutes on each CI run in a large
codebase

Cache Python Dependency Installation
pip install -r requirements.txt
# resolve dependency versions
# download and install dependencies

# In Github Workflow
steps:
- uses: actions/cache@v3
id: dependency-cache
with:
key: ${{ hashFiles('requirements.txt') }}

- if: steps.dependency-cache.outputs.cache-hit != 'true'
run: uv pip install -r requirements.txt --system

Save 5-10 minutes on each CI run in a large
codebase


Use uv to install faster
https://pypi.org/project/uv/

Cache Non-Python Dependency Installation
Common non-Python dependencies:
●Python and Node interpreters
●Database: PostgresSQL
●System packages: protobuf-compiler, graphviz, etc.
●Browsers for end-to-end tests: Playwright

Cache Non-Python Dependency Installation
Common non-Python dependencies:
●Python and Node interpreters
●Database: Postgres
●System packages: protobuf-compiler, graphviz, etc.
●Browsers for end-to-end tests: Playwright


# Dockerfile
FROM … # a base image
RUN sudo apt-get install -y postgresql-16 protobuf-compiler

Cache Non-Python Dependency Installation
Common non-Python dependencies:
●Python and Node interpreters
●Database: Postgres
●System packages: protobuf-compiler, graphviz, etc.
●Browsers for end-to-end tests: Playwright

# Dockerfile
FROM … # a base image
RUN sudo apt-get install -y postgresql-16 protobuf-compiler
# After publishing the image
# to a registry

# Github Workflow
Jobs:
run-in-container:
runs-on:ubuntu-latest
container:
image: …

Cache Non-Python Dependency Installation
Common non-Python dependencies:
●Python and Node interpreters
●Database: Postgres
●System packages: protobuf-compiler, graphviz, etc.
●Browsers for end-to-end tests: Playwright

# Dockerfile
FROM … # a base image
RUN sudo apt-get install -y postgresql-16 protobuf-compiler
Save 10 minutes or more on each CI run
in a large codebase
https://docs.github.com/en/actions/using-jobs/running-jobs-in-a-container
# After publishing the image
# to a registry

# Github Workflow
Jobs:
run-in-container:
runs-on:ubuntu-latest
container:
image: …

?????? Strategy #3: Skip Unnecessary Computations

Skip Unnecessary Tests and Linters
Only run specific tests when only specific code are changed

https://github.com/marketplace/actions/changed-files

Skip Unnecessary Tests and Linters
Only run specific tests when only specific code are changed
# Github workflow
jobs:
changed-files:
outputs:
has-py-changes: ${{ steps. find-py-changes.outputs .any_changed }}
runs-on: ubuntu-latest
steps: actions/checkout@v4
- uses: tj-actions/changed-files @44
id: find-py-changes
with:
files: **/*.py

https://github.com/marketplace/actions/changed-files

Skip Unnecessary Tests and Linters
Only run specific tests when only specific code are changed
# Github workflow
jobs:
changed-files:
outputs:
has-py-changes: ${{ steps. find-py-changes.outputs .any_changed }}
runs-on: ubuntu-latest
steps: actions/checkout@v4
- uses: tj-actions/changed-files@44
id: find-py-changes
with:
files: **/*.py
run-pytest:
needs: changed-files
if: needs.changed-files.outputs.has-py-changes == 'True'
steps:
- run: pytest

https://github.com/marketplace/actions/changed-files

Skip Unnecessary Tests and Linters
Only run specific tests when only specific code are changed
# Github workflow
jobs:
changed-files:
outputs:
has-py-changes: ${{ steps. find-py-changes.outputs .any_changed }}
runs-on: ubuntu-latest
steps: actions/checkout@v4
- uses: tj-actions/changed-files@44
id: find-py-changes
with:
files: **/*.py
run-pytest:
needs: changed-files
if: needs.changed-files.outputs.has-py-changes == 'True'
steps:
- run: pytest

https://github.com/marketplace/actions/changed-files

Only run specific tests when only specific code are changed
# Github workflow
jobs:
changed-files:
outputs:
has-py-changes: ${{ steps. find-py-changes.outputs .any_changed }}
runs-on: ubuntu-latest
steps: actions/checkout@v4
- uses: tj-actions/changed-files@44
id: find-py-changes
with:
files: **/*.py
run-pytest:
needs: changed-files
if: needs.changed-files.outputs.has-py-changes == 'True'
steps:
- run: pytest

Skip Unnecessary Tests and Linters
?????? Can also only runs on updated files in linters
✨ Modularize code and use build systems to run even fewer tests
https://github.com/marketplace/actions/changed-files

Skip Coverage Analysis for Unchanged Files
# pytest --cov by default measures coverage for all files
and it’s slow in a large codebase

# Add --cov=UPDATED_PATH1 --cov=UPDATED_PATH2 … to only
measure the updated files

Skip Coverage Analysis for Unchanged Files
# pytest --cov by default measures coverage for all files
and it’s slow in a large codebase

# Add --cov=UPDATED_PATH1 --cov=UPDATED_PATH2 … to only
measure the updated files
Save 1 minute or more on each CI run in a
large codebase

?????? Strategy #4: Modernize Runners

Use Faster and Cheaper Runners
Utilize next-generation instances with optimized CPU and memory
configurations to achieve faster and more cost-effective
performance

Use Faster and Cheaper Runners
Utilize next-generation instances with optimized CPU and memory
configurations to achieve faster and more cost-effective
performance




Or use 3rd-party-hosted runner providers:
●Namespace
●BuildJet

Use self-hosted runners with auto-scaling
https://github.com/actions/actions-runner-controller/
Use Actions Runner Controller to deploy auto-scaling runners on Kubernetes
with custom hardware specifications (e.g. AWS EC2)

Use self-hosted runners with auto-scaling
https://github.com/actions/actions-runner-controller/
Use Actions Runner Controller to deploy auto-scaling runners on Kubernetes
with custom hardware specifications (e.g. AWS EC2)

Achieved 5x+ cost savings and 2x+ faster test speeds compared to using
GitHub-hosted runners

Rujul Zaparde
Co-Founder and CEO
Continuously optimizing CI test execution time to improve
developer experiences
Results

Rujul Zaparde
Co-Founder and CEO
Continuously optimizing CI test execution time to improve
developer experiences
Results
Increasing test coverage with
beer quality assurance

Recap: ?????? Strategies for Scaling Slow Tests
in a Large Codebase
Parallel Execution01
02
03
04
Cache
Skip Unnecessary Computations
Modernize Runners

Rujul Zaparde
Co-Founder and CEO
Lu Cheng
Co-Founder and CTO
Engineering Blog
hps://engineering.ziphq.com

Job Opportunities
hps://ziphq.com/careers

Thank You!