MemGPT: Introduction to Memory Augmented Chat

chloewilliams62 201 views 50 slides Jun 06, 2024
Slide 1
Slide 1 of 50
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50

About This Presentation

Read more: https://zilliz.com/blog/introduction-to-memgpt-and-milvus-integration
Why we need memory-augmented LLMs


Slide Content

MemGPT
why we need memory-augmented LLMs

?????? Charles Packer
●PhD candidate @ Sky / BAIR, focus in AI
●Author of MemGPT
○First paper demonstrating how to give GPT-4
self-editing memory (AI that can learn over time)
●Working on agents since 2017
○“the dark ages”
○5 BC = Before ChatGPT
?????? [email protected]
?????? @charlespacker

Agents in 2017 ??????

For LLMs, “memory” is everything
memory is context
context includes long-term memory, tool use, ICL, RAG, …

For LLMs, “memory” is everything
“memory” =

MemGPT - giving LLMs real “memory”
GPT

Why is this the “best” AI product?

What about this?

Search engine AI assistant

Search engine AI assistant

Search engine AI assistant

Search engine AI assistant

tl;dr
LLMs doing constrained Q/A
�

tl;dr
LLMs doing long-range, open-ended tasks
�

90%+ of questions are
related to one project
No shared context! Why?
We don’t know how to do it…

How to get an LLM to use
●hundreds of chats
●+ code base (1M+ LoC)
●+ …

●…RAG?
●Lots of retrieval?
●Multi-step retrieval?
●Retrieval that works?
●What about writing?

…long-context LLMs?
Cost + latency
Context pollution

No shared context! Why?
We don’t know how to do it…

Search engine AI assistant
state management

MemGPT -> giving LLMs real “memory”

MemGPT -> memory via tools
LLM
tools
??????
Memory

Text
User message??????
GPT-4
Context window
8k max token limit
ChatGPT
Text
Agent reply??????
Standard LLM setup
e.g., ChatGPT UI + GPT-4 model

Event
User message??????
Document upload??????
System alert??????
Function
Send message??????
Query database
Pause interrupts??????
LLM
Virtual context
Main context
External context
∞ tokens
Max token limit
MemGPT
parse
parse
MemGPT LLM OS setup
Event loop + functions + memory hierarchy

Event
User message??????
Document upload??????
System alert??????
Function
Send message??????
Query database
Pause interrupts??????
LLM
Virtual context
Main context
External context
∞ tokens
8k max token limit
MemGPT
parse
parse
Fixed-context LLM
e.g., GPT-4 with 8k max tokens

Event
User message??????
Document upload??????
System alert??????
Function
Send message??????
Query database
Pause interrupts??????
LLM
Virtual context
Main context
External context
∞ tokens
8k max token limit
MemGPT
parse
parse
LLM inputs are “events” (JSON)
System alerts help the LLM manage memory
Virtual context
Main context
External context
∞ tokens
Max token limit
LLM

Event
User message??????
Document upload??????
System alert??????
LLM inputs are “events” (JSON)
System alerts help the LLM manage memory
{ “type”: “user_message”,
“content”: “how to undo git commit
-am?” }
{ “type”: “document_upload”,
“info”: “9 page PDF”,
“summary”: “MemGPT research paper” }
{ “type”: “system_alert”,
“content”: “Memory warning: 75% of
context used.” }

Event
User message??????
Document upload??????
System alert??????
Function
Send message??????
Query database
Pause interrupts??????
LLM
Virtual context
Main context
External context
∞ tokens
Max token limit
MemGPT
parse
parse
LLM outputs are functions (JSON)
Event loop + functions that allow editing memory

Function
Send message??????
Query database
Pause interrupts??????
Agent can query out-of-context
information with functions
{
“function”: “archival_memory_search ”,
“params”: {
“query”: “Berkeley LLM Meetup”,
“page”: “0”
}
}

Function
Send message??????
Query database
Pause interrupts??????
Pages into (finite) LLM context
{
“function”: “archival_memory_search ”,
“params”: {
“query”: “Berkeley LLM Meetup”,
“page”: “0”
}
}
LLM

Function
Send message??????
Edit context
Pause interrupts??????
Agent can edit their own memory
including their own context

{
“function”: “core_memory_replace”,
“params”: {
“old_content”: “OAI Assistants API”,
“new_content”: “MemGPT API”
}
}

Function
Send message??????
Edit context
Pause interrupts??????
Core memory is a reserved block
System
prompt
In-context
memory block
Working
context queue
{
“function”: “core_memory_replace”,
“params”: {
“old_content”: “OAI Assistants API”,
“new_content”: “MemGPT API”
}
}

Function
Send message??????
Query database
Pause interrupts??????
{
“function”: “send_message”,
“params”: {
“message”: “How may I assist you?”
}
}
User messages are a function
Allows interacting with system
autonomously w/o user inputs

{ “type”: “user_message”,
“content”: “what’s happening on may 21 2024? ” }
{
“function”: “archival_memory_search”,
“params”: {
“query”: “may 21 2024”,
}
}
{
“function”: “send_message”,
“params”: {
“message”: “Have you heard about Milvus? ”
}
}
??????
??????

what’s happening on may 21 2024?
Have you heard about Milvus?
??????
??????
(User’s POV)

Event
User message??????
Document upload??????
System alert??????
Function
Send message??????
Query database
Pause interrupts??????
LLM
Virtual context
Main context
External context
∞ tokens
Max token limit
MemGPT
parse
parse
MemGPT LLM OS setup
Event loop + functions + memory hierarchy

Calling & executing custom tools
MemGPT -> Building LLM Agents
Long-term memory management??????
??????
Loading external data sources (RAG)
??????

MemGPT
= the OSS platform for building ?????? and hosting ?????? LLM agents

DeveloperUser
MemGPT
Dev Portal
MemGPT CLI
$ memgpt run

MemGPT server
User-facing
application
REST API
Users
Agents
Tools
Sources
user_id: …
agent_id: …
Personal Assistant
State
Memories
Documents

MemGPT server
User-facing
application
REST API
Users
Agents
Tools
Sources
user_id: …
agent_id: …
Personal Assistant
State
Memories
Documents
Webhooks

MemGPT
may 21 developer update ??????

Docker integration - the fastest way to create a MemGPT server
Step 1: docker compose up
Step 2: create/edit/message agents using the MemGPT API
MemGPT ❤

MemGPT streaming API - token streaming
CLI: memgpt run --stream
REST API: use the stream_tokens flag [PR #1280 - staging]

MemGPT streaming API - token streaming
MemGPT API works with both non-streaming + streaming endpoints
If the true LLM backend doesn’t support streaming, “fake streaming”

MemGPT /chat/completions proxy API
Connect your MemGPT server to any /chat/completions service!
For example - ?????? voice call your MemGPT agents using VAPI!

MemGPT ??????
Tags