What's New in Delta 4.0? [September 16, 2025]

opensourceeventsoss 101 views 22 slides Sep 16, 2025
Slide 1
Slide 1 of 22
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22

About This Presentation

Delta 4.0 brings powerful new features that can reshape how you design and optimize your data workflows. 🚀


Slide Content

What’s New in
Delta 4.0
w/ Youssef Mrini & Scott Haines

Delta Connect
What’s New in Delta 4.0
Delta Connect
Why this matters.

●Decoupling your Spark applications from the JVM
enables you to continue to run older versions of your
application code without the need to upgrade
“everything all at once”

●Through the “plugin” ecosystem, you can extend Spark
Connect’s capabilities. For example, Delta Connect is
an extension for Spark Connect.

“Delta Table support for Spark Connect”

Delta Connect
What’s New in Delta 4.0
Spark Connect
It’s a gRPC service. The Client is a thin proxy, and the
backend acts as the brains behind the operation (the
remote driver)

●Your local SparkSession controls remote execution

●When you trigger a Spark Action, the plan is encoded
as Protobuf and sent via gRPC to the Remote server

●The Server is where the magic happens. It actually
executes the encoded Logical Plan

Exploring the Connect Ecosystem. Delta Connect is
Spark Connect.
Client: Thin API Server: Real Work
Happens Here

Delta Connect
What’s New in Delta 4.0
Connect Server
Enabling the Delta Connect Plugin.

Delta Connect
What’s New in Delta 4.0
Connect Client
Enabling a Delta Connect Session
All it takes is *one tiny modification.

●Use the SPARK_REMOTE
environment variable

●Or the .remote(“sc://….”) option on
the Builder

●To tell Spark to use the connect
protocol.

●*plus the connect client library and
plugins…
Wow. Tha?’s Si??l?!

Delta Connect
What’s New in Delta 4.0
Delta Connect: Protocol
How this all Works. Hint: It has to do with Planning.

Delta Connect
What’s New in Delta 4.0
Delta Connect in Action
Demo Time.
What you’ll need to follow along:

●Docker and Docker Compose
●The ecomm dataset from Kaggle (too big for github).
Instructions under datasets/ecomm_raw

Cluster Technique
What’s New in Delta 4.0
Liquid Clustering
Modern feature that efficiently organizes table data for optimal query performance replacing traditional partitioning and
Z-ordering It is more flexible, requires less compute, and automatically adapts data layout based on query patterns, without
rewriting existing data files
Partitioning Clustering

Cluster Technique
What’s New in Delta 4.0
Liquid Clustering
Modifying a partition in Delta Lake requires rewriting the affected data, whereas liquid clustering offers a flexible incremental
approach to reorganizing data without costly rewrites.
Partitioning Clustering

Cluster Technique
What’s New in Delta 4.0
Liquid Clustering
OPTIMIZE table_name;
Liquid clustering is incremental, meaning that data is only rewritten as necessary to accommodate data that needs to be
clustered. Already clustered data files with different clustering columns are not rewritten.

Introduction
Collations
Before After
Delta Lake 4.0 introduces collation support allowing users to specify how string values are ordered and compared within Delta
lake tables, improving sorting and searching especially for case sensitivity and multilingual datasets.
What’s New in Delta 4.0

Introduction
Collations
Table level
By default, the collation for string fields is UTF8_BINARY
Collation support enables more accurate query results under Spark 4.0+
Column level
What’s New in Delta 4.0

Introduction
Type Widening
On type widening, the table records the new type, but past data
written under the old type are not physically rewritten.

Delta Lake clients read data using the widened type, handling
conversion automatically.

How does it work ?What is type widening?
Type widening lets the schema transition to a broader
type (e.g., INT to LONG, DATE to TIMESTAMP), ensuring
existing data remains readable and new data follows the
updated type
Delta Lake 4.0 introduces type widening allowing table columns to change to a wider data type such as evolving an INT
column to Long, Double … without rewriting underlying data files making schema evolution seamless and efficient

What’s New in Delta 4.0

Introduction
Type Widening
SuccessEnable Type Widening
What’s New in Delta 4.0
Disable Type Widening
Fail

Introduction
Type Widening
What’s New in Delta 4.0

Introduction
Type Widening
What’s New in Delta 4.0
If you need to completely remove the type widening table features, you can use the DROP FEATURE:

ALTER TABLE <table-name> DROP FEATURE 'typeWidening' [TRUNCATE HISTORY]

Introduction
Variant
AfterBefore
Delta Lake 4.0 introduces Variant Data Type which enables flexible and efficient storage of Semi structured data such as JSON
making it much more performant than the traditional approach of storing synch data as string
What’s New in Delta 4.0

Introduction
Variant
AfterBefore
Native Semi Structured Support, Schema Evolution and Type Safety
What’s New in Delta 4.0

Introduction
Variant
Enable Variant for existing tables:

ALTER TABLE table_name SET TBLPROPERTIES('delta.feature.variantType-preview' = 'supported')

Warning

Once enabled, the table will not be readable by clients that don’t support variant type—compatibility is required
Variant columns cannot hold values larger than 16 MiB


What’s New in Delta 4.0

Introduction
Checkpoint protection
In Delta Lake 4.0, Checkpoint Protection is a protocol feature that ensures safe table history management and compatibility,
especially following table feature downgrades or removals
Checkpoint Protection in Delta Lake 4.0
●When a table feature is dropped (using ALTER TABLE ... DROP FEATURE ...), Delta Lake 4.0 automatically rewrites
recent table history as protected checkpoints and enables the checkpointProtection feature in the table protocol.
●This protection prevents protocol downgrade points from being accidentally removed during cleanup operations,
ensuring that older Delta Lake clients (with and without the dropped feature) can reliably read the table history up
to and after the downgrade.
●Protected checkpoints mark the transition point: older clients can read up to the checkpoint; newer clients handle
the rest.



What’s New in Delta 4.0

Thanks to the Community
Special Thanks to the Delta 4.0 Contributors

● Ada Ma, Ala Luszczak, Alexey Shishkin, Allison Portis, Ami Oka,
Amogh Jahagirdar, Andreas Chatzistergiou, Andrei Tserakhau, Andy
Lam, Anoop Johnson, Anton Erofeev, Anurag Vaibhav, Bilal Akhtar,
Carmen Kwan, Charlene Lyu, ChengJi-db, Chirag Singh, Christos
Stavrakakis, Cuong Nguyen, Dhruv Arya, Dušan Tišma, Felipe
Pessoto, FredLiu, Gene Pang, Hao Jiang, Harsh Motwani, Herman van
Hovell, Jiaheng Tang, Johan Lasperas, Juliusz Sompolski, Jun, Kaiqi
Jin, Lars Kroll, Lin Zhou, Livia Zhu, Lukas Rupprecht, Malte Sølvsten
Velin, Marko Ilić, Ming Dai, Nick Lanham, Ole Sasse, Omar Elhadidy,
Oussama Saoudi, Paddy Xu, Phil Plato, Qiyuan Dong, Rahul Shivu
Mahadev, Rajesh Parangi, Rakesh Veeramacheneni, Scott Sandre,
Slava Min, Stefan Kandic, Sumeet Varma, Thang Long Vu, Tom van
Bussel, Venkata Sai Akhil Gudesa, Venki Korukanti, Vladimir Golubev,
Wei Luo, Wenchen Fan, Xiaochong Wu, Xin Huang, Yumingxuan Guo,
Ze'ev Maor, Zhipeng Mao, Zihao Xu, Ziya Mukhtarov, chenjian2664,
emkornfield, jackierwzhang, kamcheungting-db, littlegrasscao,
mozasaur, richardc-db
● Thanks to Martin Grund for helping make Spark Connect a
reality
● Thanks to Buf for providing an OSS toolchain for the Protobuf
ecosystem

Scott and Long Vu @ Open Lakehouse Mini Summit

Q&A
Ask us Anything. Really