MemGPT: Introduction to Memory Augmented Chat

MemGPT
why we need memory-augmented LLMs

?????? Charles Packer
●PhD candidate @ Sky / BAIR, focus in AI
●Author of MemGPT
○First paper demonstrating how to give GPT-4
self-editing memory (AI that can learn over time)
●Working on agents since 2017
○“the dark ages”
○5 BC = Before ChatGPT
?????? [email protected]
?????? @charlespacker

Agents in 2017 ??????

For LLMs, “memory” is everything
memory is context
context includes long-term memory, tool use, ICL, RAG, …

For LLMs, “memory” is everything
“memory” =

MemGPT - giving LLMs real “memory”
GPT

Why is this the “best” AI product?

What about this?

Search engine AI assistant

tl;dr
LLMs doing constrained Q/A
&#3627998505;

tl;dr
LLMs doing long-range, open-ended tasks
&#3627998504;

90%+ of questions are
related to one project
No shared context! Why?
We don’t know how to do it…

How to get an LLM to use
●hundreds of chats
●+ code base (1M+ LoC)
●+ …

●…RAG?
●Lots of retrieval?
●Multi-step retrieval?
●Retrieval that works?
●What about writing?

…long-context LLMs?
Cost + latency
Context pollution

No shared context! Why?
We don’t know how to do it…

Search engine AI assistant
state management

MemGPT -> giving LLMs real “memory”

MemGPT -> memory via tools
LLM
tools
??????
Memory

Text
User message??????
GPT-4
Context window
8k max token limit
ChatGPT
Text
Agent reply??????
Standard LLM setup
e.g., ChatGPT UI + GPT-4 model

Event
User message??????
Document upload??????
System alert??????
Function
Send message??????
Query database
Pause interrupts??????
LLM
Virtual context
Main context
External context
∞ tokens
Max token limit
MemGPT
parse
parse
MemGPT LLM OS setup
Event loop + functions + memory hierarchy

Event
User message??????
Document upload??????
System alert??????
Function
Send message??????
Query database
Pause interrupts??????
LLM
Virtual context
Main context
External context
∞ tokens
8k max token limit
MemGPT
parse
parse
Fixed-context LLM
e.g., GPT-4 with 8k max tokens

Event
User message??????
Document upload??????
System alert??????
Function
Send message??????
Query database
Pause interrupts??????
LLM
Virtual context
Main context
External context
∞ tokens
8k max token limit
MemGPT
parse
parse
LLM inputs are “events” (JSON)
System alerts help the LLM manage memory
Virtual context
Main context
External context
∞ tokens
Max token limit
LLM

Event
User message??????
Document upload??????
System alert??????
LLM inputs are “events” (JSON)
System alerts help the LLM manage memory
{ “type”: “user_message”,
“content”: “how to undo git commit
-am?” }
{ “type”: “document_upload”,
“info”: “9 page PDF”,
“summary”: “MemGPT research paper” }
{ “type”: “system_alert”,
“content”: “Memory warning: 75% of
context used.” }

Event
User message??????
Document upload??????
System alert??????
Function
Send message??????
Query database
Pause interrupts??????
LLM
Virtual context
Main context
External context
∞ tokens
Max token limit
MemGPT
parse
parse
LLM outputs are functions (JSON)
Event loop + functions that allow editing memory

Function
Send message??????
Query database
Pause interrupts??????
Agent can query out-of-context
information with functions
{
“function”: “archival_memory_search ”,
“params”: {
“query”: “Berkeley LLM Meetup”,
“page”: “0”
}
}

Function
Send message??????
Query database
Pause interrupts??????
Pages into (finite) LLM context
{
“function”: “archival_memory_search ”,
“params”: {
“query”: “Berkeley LLM Meetup”,
“page”: “0”
}
}
LLM

Function
Send message??????
Edit context
Pause interrupts??????
Agent can edit their own memory
including their own context

{
“function”: “core_memory_replace”,
“params”: {
“old_content”: “OAI Assistants API”,
“new_content”: “MemGPT API”
}
}

Function
Send message??????
Edit context
Pause interrupts??????
Core memory is a reserved block
System
prompt
In-context
memory block
Working
context queue
{
“function”: “core_memory_replace”,
“params”: {
“old_content”: “OAI Assistants API”,
“new_content”: “MemGPT API”
}
}

Function
Send message??????
Query database
Pause interrupts??????
{
“function”: “send_message”,
“params”: {
“message”: “How may I assist you?”
}
}
User messages are a function
Allows interacting with system
autonomously w/o user inputs

{ “type”: “user_message”,
“content”: “what’s happening on may 21 2024? ” }
{
“function”: “archival_memory_search”,
“params”: {
“query”: “may 21 2024”,
}
}
{
“function”: “send_message”,
“params”: {
“message”: “Have you heard about Milvus? ”
}
}
??????
??????

what’s happening on may 21 2024?
Have you heard about Milvus?
??????
??????
(User’s POV)

Event
User message??????
Document upload??????
System alert??????
Function
Send message??????
Query database
Pause interrupts??????
LLM
Virtual context
Main context
External context
∞ tokens
Max token limit
MemGPT
parse
parse
MemGPT LLM OS setup
Event loop + functions + memory hierarchy

Calling & executing custom tools
MemGPT -> Building LLM Agents
Long-term memory management??????
??????
Loading external data sources (RAG)
??????

MemGPT
= the OSS platform for building ?????? and hosting ?????? LLM agents

DeveloperUser
MemGPT
Dev Portal
MemGPT CLI
$ memgpt run

MemGPT server
User-facing
application
REST API
Users
Agents
Tools
Sources
user_id: …
agent_id: …
Personal Assistant
State
Memories
Documents

MemGPT server
User-facing
application
REST API
Users
Agents
Tools
Sources
user_id: …
agent_id: …
Personal Assistant
State
Memories
Documents
Webhooks

MemGPT
may 21 developer update ??????

Docker integration - the fastest way to create a MemGPT server
Step 1: docker compose up
Step 2: create/edit/message agents using the MemGPT API
MemGPT ❤

MemGPT streaming API - token streaming
CLI: memgpt run --stream
REST API: use the stream_tokens ﬂag [PR #1280 - staging]

MemGPT streaming API - token streaming
MemGPT API works with both non-streaming + streaming endpoints
If the true LLM backend doesn’t support streaming, “fake streaming”

MemGPT /chat/completions proxy API
Connect your MemGPT server to any /chat/completions service!
For example - ?????? voice call your MemGPT agents using VAPI!

MemGPT ??????

MemGPT: Introduction to Memory Augmented Chat

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

MemGPT: Introduction to Memory Augmented Chat

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 12

Slide 13

Slide 15

Slide 16

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Slide 38

Slide 39

Slide 40

Slide 41

Slide 42

Slide 43

Slide 44

Slide 45

Slide 47

Slide 48

Slide 49

Slide 50

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Pray For The Peace Of Jerusalem and You Will Prosper

Don_t_Waste_Your_Life_God.....powerpoint

VILLASUR_FACTORS_TO_CONSIDER_IN_PLATING_SALAD_10-13.pdf

Fertility awareness methods for women in the society

Chapter 5 Arithmetic Functions Computer Organisation and Architecture

syakira bhasa inggris (1) (1).pptx.......