Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI

VladimirIglovikov 38 views 23 slides Jun 04, 2024
Slide 1
Slide 1 of 23
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23

About This Presentation

Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/


This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.

Created out of a necessity for ...


Slide Content

Vladimir Iglovikov. May 30, 2024
Enchancing adoption of Open Source
Libraries
A case study on Albumentations.ai

People
Contributors

Request to the audience

https://github.com/albumentations-team/albumentations

Vladimir Iglovikov
Decorated Veteran of Airborne Special Forces
PhD in Theoretical Physics
Ex Staff ML Engineer, Lyft Level 5 (Self Driving)
Kaggle Competitions Grandmaster
Co-creator of Albumentations.AI
20+ scientific papers in Physics and Deep Learning

Use library in their GitHub repos

What the library is used for?
Data Augmentation: generation new data from the existing on the fly

Metric?
•Money
•Daily Active users
•Downloads
•Used by repositories
•Stars at GitHub
•Visitors to the website

Metric?
I know how to make money as an employee, but do not know how to make money from OpenSource, yet
•Money
•Daily Active users
•Downloads
•Used by repositories
•Stars at GitHub
•Visitors on the website
$72,000 / month as ML Engineer $25 / month from donations
$$$ working for the company vs working on OpenSource

Metric?
I do not know how to measure Daily Active users
•Money
•Daily Active users
•Downloads
•Used by repositories
•Stars at GitHub
•Visitors on the website

Metric
We know how to measure Downloads
Downloads in the past 120 days
Source: https://pypistats.org/packages/albumentations
PyPI download leaderboard as of May 1, 2024
Source: https://pypilb.vercel.app

Albumentations was created out of Desperation
•Similar products were
unsatisfactory for winning on
Kaggle
•We were early adopters
•For months used code without
releasing
Performance benchmark. Higher - better. Mar 25, 2024
Source: https://albumentations.ai/docs/benchmarking_results

Hint 1: Create artifacts
Should be easy to share your works with others
1.GitHub repo with readme
2.Website
1.Landing
2.Automatically generated documentation
3.Scientific paper or preprint
Could be of ultra horrible quality - not important. You will fix it later.

Hint 2: Automate everything
Move fast, break things with stable infrastructure
•Code hygiene:
•ruff
•mypy
•pre-commit hooks
•Sourcery.ai
•Tests
•manual
•test suits
•automatic checks that
new code is tested
•CI/CD
•rebuild website
•run all checks and tests on
•linux, mac, windows
•python 3.8-3.12
Why?
1. You will be able to create 1000+ line Pull Requests
2. Faster iterations
3.Easier for people to contribute or build on top of the library

Hint 3: Make adoption easy
Show value fast
•Dream outcome / perceived likelihood of achievement
•Show what you achieved with the library. (Ex: win competitions at Kaggle)
•Effort an sacrifice
•Tutorials on how to move from existing libraries (Ex: torchvision, keras =>
albumentations)
•Time/delay
•Clear and easy installation and integration process.
Source: “100 Million offer”

Hint 4: Iterate fast
Better done than perfect. Volume beats focus. Quantity has quality in itself.
•Small improvements a few times per day is better than big, but never.
•Do not overpolish. If you break all the code - not a big deal. Just fix it.
•Do not overthink with planning
•In the long run - any 100 improvements every day much better than 1
carefully planned improvement per day

Hint 5: Be friends with the community
Volume beats focus. More people, more volume.
•Contributing should be:
•Fun! People like being a part of the dynamic successful story.
•Easy:
•Clear guidelines and examples
•Fast response in direct messages and in issues
•Be forgiving and helpful to the contributors as in 90% their code will be of low quality
•Low pressure:
•People contribute for free - they can drop any work at any time. No hard feelings, no questions asked.
•Pleasing:
•Thank people for their work (code, blog posts, mentions) as much and as often as possible.

Hint 6: Mental Health
You do not owe anyone anything
•Till users pay you money, you do not owe them anything. You do it because it is fun.
•They
•have emergent Pull Request to merge -> they can wait.
•are submitting critical bug -> bug can wait.
•want a particular feature -> up to you do decide if you want to implement it and
when.
•People push you to do something -> ask them to become sponsors.

Hint 7: Marketing Hygiene (low impact)
Real marketing happens offline
•Create LinkedIn, twitter, etc pages for the package.
•Post there anything related to the library - releases, testimonials, someone
referenced you in a good way
•Try not use words and phrases you will not use talking to your mom
•bad: “Great news! We are excited to announce our new amazing feature”
•dev2dev: “We implemented XXX, we think it is pretty cool, it helps with
YYY, unlocks PPP and we did it because of ZZZ. Here is how you can use
it.”

Hint 8: Offline Marketing (high impact)
Real marketing happens offline
•Direct sales or being annoying (especially valuable with influencers):
•“What do you use now?”
•“What stops you from moving to Albumentations?”
•“If we do XXX and YYY, that you said is missing now, will you give it a try?”
•“We added XXX and YYY, I checked you GitHub repo - here is how you can
update your pipeline with Albumentations which removes 1000 lines of
code and gives 10x speedup. Let’s merge? And here is link to the tutorial
that is similar to you task.”

Hint 9: Integrations
1 + 1 = 3, or even 5
•Your library should be almost a drop in replacement for the standard.
•Other packages should benefit from using you vs standard.
•Minimum number of dependencies as they create dependency hell.
•Others are interested in integrations even more than you, do not be shy -
reach out to them. Cold and warm outreach works like magic.

Hint 10: Collaboration
Make marketing people of other companies benefit from your work
Your resources are limited but marketing departments are searching for high
quality content.
=>
•Do not organize meetups => talk on one organized by others
•Do not publish in your blog => publish in a popular blog
•Do not create your own podcast => talk on other person podcast.

Summary
•Move fast, break things with stable infrastructure
•To move fast, you need to move slow.
•Small improvements compound.
•Do not optimize for focus, optimize for volume (fast iterations with regular
cadence)
•Real marketing is happening offline. Cold message to people, learn from
them, sell to them.
•Have fun!

Thanks!
https://albumentations.ai/
https://www.linkedin.com/company/
100504475
@albumentations
@iglovikov
@ternaus
@viglovikov
Vladimir Iglovikov