Turbocharging MCP: Speed, Smarts, and Scale by Viraj Sharma

ScyllaDB 0 views 16 slides Oct 16, 2025
Slide 1
Slide 1 of 16
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16

About This Presentation

Learn how to speed up Model Context Protocol (MCP) tools using async servers, caching, batching, and smart data handling—making your AI tool calls faster, smoother, and more efficient.


Slide Content

A ScyllaDB Community
Turbocharging MCP:
Speed, Smarts, and Scale
Viraj Sharma
Student

Viraj Sharma he/him

Student at Presidium School Indirapuram, Delhi, India
■Spoke about MCP at python conferences (Delhi, Lithuania,
LinuxFest)
■Awesome gathering of performance experts
■I have interest in Indian mythology, science and poetry
■I am usually coding or brainstorming ideas with my father

Primer: Model Context Protocol

Primer: What is MCP - Anthropic
Standard way for language models
to interact with tools, APIs, and live
data.
■Core Mechanism:
●Based on JSON-RPC 2.0
■How It Works:
●Model → MCP → Tool → Response
■Why it matters
●Brings live, external capabilities
directly into model conversations.

Why Performance Matters in MCP
Performance = Better and More
accurate LLM user Experience
■Real-Time Needs:
●Users expect instant responses
■Impact of Slowness:
●Increases server costs &
infrastructure strain.
■Goal
●Make MCP interactions faster, lighter,
more reliable without major code
changes

MCP Performance Strategies

Elicitation-Driven Filtering
Additional filtering before execution
using elicitation

■Additional filtering can be applied
before execution

■The amount of data transferred
can be reduced

■Response times can be improved

{
"type": "elicitation",
"parameters": [
{ "name": "dateRange", "value":
"2025-01-01 to 2025-01-31" },
{ "name": "region", "value": "APAC" },
{ "name": "metrics", "value": ["sales",
"conversionRate"] }
]
}

Smart Tool Discovery & Updates
Data boundary and parameters can be
revealed during tool discovery phase
itself
■The data boundaries can be revealed
upfront - Avoids Null or Empty
Responses
■The data ranges can be provided
■The discovery can support partial
updates




{
"type": "tool/discovery",
"tool": "DataAnalyzer",
"metadata": {
"data_boundaries": {
"start": "2025-01-01T00:00:00Z",
"end": "2025-08-12T00:00:00Z"
},
"data_ranges": [
{ "min": 1000, "max": 5000 }
]
}

Predefined Prompt Optimization
Using the prompts feature - predefined
optimized queries, urls can be setup

■Uses consistent query patterns

■Triggers correct processing paths

■Avoids redundant interpretation steps


{
"type": "request",
"method": "prompt",
"params": {
"message": "Summarize the latest
performance metrics",
"maxTokens": 200,
"temperature": 0.7
},
"id": 42
}

Resource-Aware Context Selection
Use resources and resource templates to
expose

■Only needed parts of a resource are
fetched (partial retrieval)

■Subscriptions to reactively load
resources instead of polling

■Lazy loading reduces memory
footprint until data is actually
required



"resources": [
{
"uri": "file:///project/src/main.rs",
"name": "main.rs",
"description": "Primary application
entry point",
"mimeType": "text/x-rust"
}
],

Clean Lifecycle Shutdown
A graceful termination where the client
sends a shutdown request

■Connections can be closed
gracefully

■Memory and CPU can be freed

■Performance degradation can be
avoided by leaking connection
handles, http resources



{
"type": "notification",
"method": "shutdownInitiated",
"params": {
"reason": "maintenance_window",
"timestamp": "2025-08-12T08:10:00Z"
}
}

Chunked Data via Streamable HTTP
Using Chunked data in response with
streamableHttp feature

■The data can be sent as segments

■Memory usage spikes can be reduced

■Perceived response times can be
lowered


HTTP/1.1 200 OK
Content-Type: application/json
Transfer-Encoding: chunked


4
{"pa
6
rt": 1}
5
{"mor
7
e": "da"}

Cancellation-Driven Resource Reclamation
In flight data access/user operation
cancellation

■Connections can be closed gracefully

■Memory and CPU can be freed

■Performance degradation can be
avoided due to excessive wasteful
workload on data sources

{
"jsonrpc": "2.0",
"method": "notifications/cancelled",
"params": {
"requestId": "123",
"reason": "User requested cancellation"
}
}

Metric Statistic
SDK Downloads (weekly) > 8 million
MCP Servers > 10,000
MCP Tools Available 527
New Servers Added (Weekly) ~800
Major Adopters OpenAI, Replit, DeepMind, etc.
MCP Ecosystem at a glance

Thank you! Let’s connect.
Viraj Sharma
[email protected]
@virajsharma2000
sharmaviraj.com
Tags