LLM GenAI - the new processing unit for developers

JacquesDeVos 1 views 26 slides Oct 13, 2025
Slide 1
Slide 1 of 26
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26

About This Presentation

In this talk I explain LLMs as a new kind of processor, and compares it to traditional CPUs. The audience is software engineers, who are familiar with traditional development and CPUs work.


Slide Content

LLM GenAI: the new processing unit Jacques de Vos in collaboration with GenAI

Like mainframes, microprocessors, and the Internet, GenAI adds a new level of capability to computing, without removing anything. In this talk I want show a software engineer’s view of GenAI rather than a data scientists perspective (which most online material focuses on).

This talk is my (subjective) take on the question: What does the LLM GenAI revolution mean for software application developers , like myself? (as opposed to data scientists or general public)

Agenda Computing: past & future The LLM Breakthrough LLMs: the new processing unit The language of LLMs Programming with LLMs Additional ideas (by LLM)

Computing eras: past & future Each breakthrough adds new development tools Technologies build upon each other: - Previous tools remain valuable - New capabilities enhance, don't replace

Mainstream Computing Eras 1. Transistor-Mainframe Era - mid 1960s Large-scale batch processing of instruction on data Killer App: The Database 2. CPU/Microprocessor Era - early 1970s Early - Embedded CPU Era - early 1970s Small, CPUs executing instructions & I/O Killer apps: Specialised Hardware Controllers Late - PC Era - late 1970s Innovation: Personal computing with friendly UIX Killer App: The Spreadsheet (1979) 3. Internet Era - mid 1990s Early - WWW Era - mid 1990s Breakthrough: Global interconnection of computers Killer App: Search engines Late - Mobile Era - late 2000s Expansion: Always-on connectivity via smartphones Killer App: Social media 4. GenAI LLM Era - 2023 Multi-task processor capable of zero or few-shot learning Killer App: Personal Assistants

New LLM Era Dev Opportunities Like other eras, this tech gives opportunities for developers at many many levels. Eg: Industry specific applications Developer IDE AI like Cursor or CoPilot CoCounsel and other legal Mid Journey for visual designers Many industries will get one - many opportunities General assistant applications ChatGPT & others - plugin development Textual/Voice interface can become a mainstream application UI (finally Siri, Alexa can become a proper UI) Use capabilities in any application - a new normal Yes, robots and all that hyped Jazz as well

Dev Tools & Ecosystems Many new tools: Dev environment (IDE) like Cursor or CoPilot Local Operating System SDKs (hopefully in next few years!) O/S Apple Intelligence: Writing, Image, Siri, etc Becomes just a simple capability to use in app O/S will provide access to raw local LLM soon?? Open Source Models & SDKs Can run locally or in cloud Big or small, ability to compose, fine tune etc Watch this space! Industry specific local SDKs and cloud APIs: A wide ecosystem of speciality APIs & SDKs that require accuracy or speed in a domain. Use and build! Powerful Cloud Models via API (paid) Terabytes with massive knowledge and very smart Ability to prompt or fine tune or build pipelines on top etc !Incentivised to be overhyped by big players! Many frameworks that help you compose systems

The Transformer Breakthrough What just happened? What made deep learning suddenly work ?

Great ideas! Early 1980s & before The idea of machine learning with a neural network was developed and refined. Big expectations set! 2024 Nobel Prize in Physics was awarded to John J. Hopfield and Geoffrey E. Hinton for 1980 contributions to neural nets! https://www.nobelprize.org/prizes/physics/2024/press-release/

The specialists 1980s until 2017 It worked, but only for specialised neural nets - trained by specialists with lots of data. 2010s - Google showed what is possible - eg with photo tagging . But who else has that amount of specialised data and processing? Disappointingly narrow success! Couldn’t even solve turing test.

The breakthrough! 2017: Transformer Generalist LLM Paper by 8 Google engineers “Attention is all you need” introduced the Transformer: a neural net architecture State of the art translations with minimal training Generalized well to other tasks It was a big simplification and took much less time to train A “Fleming-Penicillin” moment! Architecture matters less after the transformer “We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence (RNN) and convolutions entirely.”

GenAI = quick learner 2020: OpenAI: LLMs are Few-Shot Learners With GPT3 (p re-Chat GPT) Few-Shot learning means a few examples rather than millions of specialized samples. Zero-shot = just a question. Showed that LLM be used many language tasks, that doesn’t need specialised skills or training or data science. Ie. it can be used by anyone to perform many tasks! GenAI Understated conclusion: “…large language models may be an important ingredient in the development of adaptable, general language systems.”

The scaling law mystery 2020: the same OpenAI The same paper hinted at something unexpected: The more data was used to train with the smarter (more accurate) it got. It just kept on going linearly and didn’t flatten out as expected This was a complete surprise ! This law is still holding - that is why we are seeing more an more data and processing. Current models more than $100M to train.

The generalist (not just text) 2020 and beyond Top new LLMs ( GPT-4o/LLAMA3.2) are multimodal and can handle text, images, audio at once. Transformers also even excel in other fields like chemistry. 2024 Nobel Prize in Chemistry to Demis Hassabis and John Jumper Google DeepMind/AlphaGo to predicting proteins’ structures https://www.nobelprize.org/prizes/chemistry/2024/press-release/ (my final year early 2000 was on protein structure prediction - it sucked then)

Evolving fast 2025 and beyond The potential and the underlying workings has seemed to show itself now. Models are becoming smarter, smaller, faster, and locally available on devices . This will probably continue for years. Ecosystems becoming more and more developer friendly, eg Ollama, Transformers.js Prompt Engineering and AI applications is still a “dark art”, rather than structured engineering, but there are signs that it is maturing, DSPy https://github.com/stanfordnlp/dspy Great time to start learning!

GenAI LLMs: a new processing unit Traditional CPUs and GenAI LLM both have: A processor that follows instructions A temporary memory with instructions+data Output data feeds back into memory and alters the flow MEMORY with Instructions and Data PROCESSOR

END INFERENCE IF END TOKEN GENERATED PROMPT + GENERATED RESPONSE SO FAR + LAST GENERATED TOKEN = CONTEXT (MAX = CONTEXT WINDOW) CONTEXT (MEMORY) Instructions+data. Just one long text sequence - THAT’S ALL *Will go into detail next slide PROMPT (INPUT) 1. String first parsed into tokens, then 2. each token & position into a embedding vector (to capture meaning), **Will go into more detail “COMPILED” EMBEDDINGS PREDICT MOST LIKELY NEXT TOKEN BASED ON THE TOKEN SEQUENCE, BY RUNNING THROUGH THE MODEL (TRANSFORMER NEURAL NETWORK) LLM INFERENCE LOOP GENERATED TOKEN Generated text. Also called “completion”. RESPONSE (OUTPUT) LLM Inference Loop Response added to saved history on file/cloud. More like a file than part of processing memory. CHAT HISTORY

LLM INFERENCE LOOP C ontext-aware Prompting: the trick to steering an LLM (the real “Prompt Engineering”) Retrieval Augmented Generation means “add some search results”. You can build your own RAG system to make LLM specialised. SEARCHED DOCS /SUMMARY Instruction+examples/data. Can be machine generated or anything. FINAL PART IS HIGHEST PRIORITY “USER” PROMPT Previous prompts + responses in the current chat. “CHAT” HISTORY CUSTOMER SEMANTIC EMBEDDING VECTOR DATABASE GOOGLE SEARCH TOP RESULTS TOOL ADAPTER THAT CAN ADD / CHANGE Just 1 long string. Just “gooi” it in the prompt Personality, conciseness, style, summary of past interactions etc. PREFERENCES PROMPT (INITIAL CONTEXT)

Trad CPU vs LLM as processor Processor Memory Instructions Machine Language Process Process Completes when Trad CPU RAM with programs and data (text, vids..) Eg 16GB RAM Prog Lang Programs Assembly with fixed instruction set (eg x86/ARM) Executes program: Computes each operation from program sequentially Completes program when reaching end of program. Operation can be to jump/loop to avoid completion. LLM Context with instructions and examples and other context (text, vids..) Eg 4,096 token context window Human Lang Instructions Token vector embeddings sequences with limitless potential instructions Runs inference: Generates next token based on input context. Appends the generated token to the next input context and generates the next token. Completes inference after predicting task/answer has been completed (ie after creating end token ). Can’t jump/loop (although you can loop by piping inferences, like O1).

Performance Benchmarking Speed: CPU is Instructions/second (MIPS) LLM is Tokens/second Also, LLM Stream Start (Seconds) - it takes a while to “warm up” Accuracy: Key for LLM, N/A for CPU Measured in % like school report Aspects:Knowledge, Reasoning, Math, Coding, Vision https://klu.ai/llm-leaderboard

The 2 Ways To Program an LLM Processor Mechanism Benefits Drawbacks Prompt Engineering Simply pass data and instructions General purpose task Stays up to date Assembly with fixed instruction set (eg x86/ARM) LLM Fine Tuning Change some of the weights of the neural network given a few 100 examples (not millions!) Runs cheaper and faster at scale Can potentially embed more info (not necessarily) Token vector embeddings sequences with limitless potential instructions

The language of a language model

https://platform.openai.com/tokenizer

What is a Token Vector Embedding? Every possible token makes up the token vocabulary , each token has an index, eg “cat” could have index 12345. There is a table for the vocabulary, where for index there is a vector eg 12345:[0.99, 0.01, 0.75, 0.50, 0.35]. The vectors are called “embeddings” since the “meaning” and context of token is captured in it. E.g. the first dimension in the vector 0.99 could stand for “animalness”, the second dimension 0.01 for “verbness” - we could define it this way. But actual vector embeddings are created through training (seperate from LLM), so the components doesn’t have nice direct meaning like that. Eg GPT4 has 1536 dimensions! We don’t know exactly what they mean. Very useful in your own RAG - or querying for meaning.

Developing with LLMs Examples: RAG - show wordpress. Show a few OLlama examples Show Hugging Face Transformers.js