Talk by Matt Welsh at Craft Conference 2024 on the impact that Large Language Models will have on the future of software development. In this talk, I discuss the ways in which LLMs will impact the software industry, from replacing human software developers with AI, to replacing conventional software...
Talk by Matt Welsh at Craft Conference 2024 on the impact that Large Language Models will have on the future of software development. In this talk, I discuss the ways in which LLMs will impact the software industry, from replacing human software developers with AI, to replacing conventional software with models that perform reasoning, computation, and problem-solving.
Who is this guy?!?
Been writing code since 1984 (on a VIC-20)
PhD in CS from UC Berkeley
Prof of CS at Harvard
Wrote one of the first books about Linux
Engineering leadership at Google, Apple, OctoAI
Founded a couple of AI startups
Who is this guy?!?
Been writing code since 1984 (on a VIC-20)
PhD in CS from UC Berkeley
Prof of CS at Harvard
Wrote one of the first books about Linux
Engineering leadership at Google, Apple, OctoAI
Founded a couple of AI startups
Speaker at Craft Conference 2024
This is everyone here
Large
Language
Models
This is everyone here
*** COMPUTER SCIENCE IS DOOMED ***
Computer Science has always been about one thing:
Translating ideas into programs.
CS is the study of how to take a problem and map it onto
instructions that can be executed by a Von Neumann machine.
7
*** COMPUTER SCIENCE IS DOOMED ***
Critically, the goal of CS has always been that programs
are implemented, maintained, and understood by humans.
But -- spoiler alert! -- humans suck at all of these things .
8
Let’s just make programming easier!
Fifty years of programming language research has done
nothing to improve the state of affairs.
No amount of improvement to type systems, debugging, static
analysis, linters, or documentation is going to magically
solve this problem.
9
Let’s just make programming easier!
FORTRAN (1957)
10
Let’s just make programming easier!
BASIC (1964)
11
Let’s just make programming easier!
APL (1966)
12
Let’s just make programming easier!
Rust (2010)
13
This is how I program now...
14
This is how I program now...
15
This is how I program now...
16
What is the algorithm being implemented here?
How would you write it down as code?
What, if anything, could you prove about this program?
17
18
(Not Donald Knuth)
How much does it cost to replace one human with AI?
Let’s assume (for now) that LLMs will be able to do most,
or all, of the programming work done by a human software
developer.
What would it cost to replace a human SWE with LLM calls?
19
How much does it cost to replace one human with AI?
Typical SWE salary: $220,000
20
How much does it cost to replace one human with AI?
Typical SWE salary: $220,000
Benefits, taxes, free breakfast, lunch, dinner, snacks,
masseuse, shuttle bus, on-site doctor, bowling alley, ...
$92,000
Total: $312,000
21
How much does it cost to replace one human with AI?
Typical SWE salary: $220,000
Benefits, taxes, free breakfast, lunch, dinner, snacks,
masseuse, shuttle bus, on-site doctor, bowling alley, ...
$92,000
Total: $312,000
Number of working days per year: 260
Total cost for one-human-SWE-day: $1200
22
How much does it cost to replace one human with AI?
Average lines of finalized code checked in per day ~= 100
23
How much does it cost to replace one human with AI?
Average lines of finalized code checked in per day ~= 100
Average number of GPT-4o tokens per line ~= 10
Price for GPT-4o = $0.005 / 1K tokens (input)
$0.015 / 1K tokens (output)
24
How much does it cost to replace one human with AI?
Average lines of finalized code checked in per day ~= 100
Average number of GPT-4o tokens per line ~= 10
Price for GPT-4o = $0.005 / 1K tokens (input)
$0.015 / 1K tokens (output)
Assume 5:1 input-to-output token ratio
Total cost for one-human-SWE-day equivalent work: $0.04
25
How much does it cost to replace one human with AI?
$0.04 / day
(30,000x cheaper)
26
$1200 / day
How much does it cost to replace one human with AI?
The robot does not take breaks.
The robot does not require catered
lunches or on-site massage.
The robot takes the same length of
time whether it’s a prototype or
final production code.
The robot makes plenty of
mistakes, but makes them
incredibly quickly.
27
IBM 607 ad - 1953
28
“150 Extra Engineers
An IBM Electronic Calculator
speeds through thousands of
intricate computations so quickly
that on many complex problems
it’s just like having 150 EXTRA
Engineers.”
Beyond Programming
Conventional programs are about
giving direct instructions to
simple symbolic machines.
But what if the machine can
understand natural language
and perform complex reasoning
directly?
29
Teaching, not programming
Gradually, programming will be replaced by models that can:
-Understand problem definitions in natural language
-Perform sophisticated reasoning and cognition
-Find novel solutions to problems
-Learn how to do new things on their own
30
>> This radically changes how we think about computation. <<
The Natural Language Computer
31
Natural
language
"program"
Large Language
Model
ChatGPT
The Natural Language Computer
32
Natural
language
"program"
External tools
("peripherals")
{ API }
Large Language
Model
ChatGPT
The Natural Language Computer
33
Natural
language
"program"
External tools
("peripherals")
{ API }
Large Language
Model
Task
Task
Task
ChatGPT
The Natural Language Computer
34
Natural
language
"program"
External tools
("peripherals")
{ API }
Large Language
Model
Short-term
memory
Vector DB
Long-term
memory
Task
Task
Task
ChatGPT
The Natural Language Computer
35
Natural
language
"program"
External tools
("peripherals")
{ API }
Large Language
Model
Short-term
memory
Vector DB
Long-term
memory
Task
Task
Task
ChatGPT
Two key questions...
1.What is the extent of LLMs’ reasoning abilities?
2.What is the right way to “program” in natural language?
36
How well can LLMs reason?
38 https://arxiv.org/abs/2206.04615
214
benchmarks
measuring a
wide range of
reasoning
tasks
Measuring LLM reasoning - Google’s BIG-Bench
Measuring LLM reasoning - Google’s BIG-Bench
39
As models get
larger, they
get better
(but still not
great overall)
40
Example: BIG-Bench “checkmate_in_one” task
In the following chess position, find a
checkmate-in-one move.
41
In the given chess position, the move for White to
checkmate in one is:
15. Qxe7#
This move places the Black king in check and there
is no legal move for Black to escape the check,
resulting in checkmate.
In the following chess position, find a
checkmate-in-one move.
What if we add an extra “step”
to the prompt?
Example: BIG-Bench “checkmate_in_one” task
43
In the given chess position, the checkmate-in-one
move is:
15. Nd6#
This move delivers checkmate because the knight
on d6 controls key squares around the Black king,
and there is no way for Black to capture or block
the check.
In the following chess position, find a
checkmate-in-one move.
What’s the right way to program
in natural language?
45
How things are done today...
Natural language != plain text
Existing frameworks treat LLM prompts and responses as
plain strings.
46
Natural language != plain text
Existing frameworks treat LLM prompts and responses as
plain strings.
But natural language...
-Has a great deal of ambiguity
-Can encode complex algorithmic concepts
-Carries detailed semantic information
-Requires different syntactic structures depending on the
language used (Chinese, English, and Yoruba are not the
same!)