CoC25US - NiFi Man - “We're here – but should we have come_”

bunkertor 63 views 16 slides Sep 15, 2025
Slide 1
Slide 1 of 16
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16

About This Presentation

CoC25US - NiFi Man - “We're here – but should we have come_”

Community over Code
september 12, 2025 Minneapolis

https://communityovercode.org/schedule/


Timothy Spann
NiFI Man: “We're here – but should we have come?”
The last few years, travel has been tough with diseases, air...


Slide Content

NiFi Man: “We're here – but
should we have come?”
Tim Spann, Senior Solutions Engineer, Snowflake

Tim Spann

paasdev.bsky.social
@PaasDev // Blog: datainmotion.dev
Senior Solutions Engineer, Snowflake
NY/NJ/Philly - Cloud Data + AI Meetups
ex-Zilliz, ex-Pivotal, ex-Cloudera, ex-HPE,
ex-StreamNative, ex-EY, ex-Hortonworks.

https://medium.com/@tspann
https://github.com/tspannhw

This week in Apache NiFi, Apache Polaris,
Apache Flink, Apache Kafka, ML, AI,
Streamlit, Jupyter, Apache Iceberg, Python,
Java, LLM, GenAI, Snowflake, Unstructured
Data and Open Source friends.

https://bit.ly/32dAJft
AI + Streaming Weekly by Tim Spann

4
Circle of Friends

The last few years, travel has been tough with diseases, air quality problems, fires, airline
delays, wars and other events. The only way to know is to measure the conditions and make
that decision. So using ASF projects including NiFi, Iceberg, Kafka, Calcite, Polaris and Tika
will do just that.
There are so many streams of data to look at to determine if it's worth the trip from flights,
delays, the weather, air quality, local sensors, travel advisories, reviews, social media, local
transit and more.
So I looked at everything and determined yes Minnesota is worth the trip from sunny New
Jersey. And I'll show you how to make those decisions too.


Yes, I love "Travel Man". 1300 Nicollet Mall, Minneapolis, MN 55403

Community / Fun

Who / What / Where / Why
The talks - https://communityovercode.org/schedule/

The friends from the community and past ApacheCons (Montreal, NoLA, Halifax)

Committers from my favorite projects

Apache NiFi, Apache Kafka, Apache Calcite, Apache Iceberg, Apache Tika,
Apache Spark

New Releases, New Projects, AI Everywhere

It’s like the internet in person

My next talk @ 4:10 pm → 40 min expands on HOW
Utilizing Real-Time Transit Data for Travel Optimization

Travel
Flying - ADS-B and Plane Data Feeds

Uber - No Live Data!
Transit - https://svc.metrotransit.org/ Open Data

Conference: Hyatt Regency
44.9705° N, 93.2781° W



https://github.com/tspannhw/conferences/blob/main/2025/communityovercode/

Store the Data
Travel Advisories (From RSS)
ICEBERG TABLE TRAVELADVISORIES

Local Transit (From GTFS-RT protobuf or SIRI JSON)
ICEBERG TABLE VEHICLEPOSITIONS
ICEBERG TABLE SUBWAY
ICEBERG TABLE SERVICEALERTS
ICEBERG TABLE ICYMTA
ICEBERG TABLE MTABUSVEHICLEMONITORING ← TODO

Flights / Airport / ADS-B / Planes (From JSON and Antenna)
ICEBERG TABLE FLIGHT_DATA_ICEBERG
ICEBERG TABLE PLANES

Store the Data
Weather (From XML)
ICEBERG TABLE NOAAWEATHER

Air Quality (From JSON)
ICEBERG TABLE AQ
ICEBERG TABLE AQFORECAST

Local Sensors (From RPI as JSON)
ICEBERG TABLE SENSORS

Social Media BlueSky (From JSON) ←- TODO HASHTAG?
ICEBERG TABLE SOCIALMEDIA

Local Transit Data
GTFS-realtime data

Metro Transit's GTFS-realtime feeds are refreshed every 10 seconds.
https://svc.metrotransit.org/

TripUpdate feed: https://svc.metrotransit.org/mtgtfs/tripupdates.pb
VehiclePosition feed: https://svc.metrotransit.org/mtgtfs/vehiclepositions.pb
ServiceAlerts feed: https://svc.metrotransit.org/mtgtfs/alerts.pb

https://svc.metrotransit.org/mtgtfs/archive/

Using ASF Projects to Decide
11
DATA
SOURCES
DATA
INTEGRATION
DATA
PLATFORM
DATA
CONSUMERS
Sensors
Transit Data
AI/ML & Apps
Traffic Data
Raw Data
DATA
FROM
THE
REAL

UNSTRUCTURED DATA WITH APACHE NIFI

•Archives - tar, gzipped, zipped, …
•Images - PNG, JPG, GIF, BMP, …
•Documents - HTML, Markdown, RSS, PDF, Doc, RTF,
Plain Text, …
•Videos - MP4, Clips, Mov, Youtube URL…
•Sound - MP3, …
•Social / Chat - Slack, Discord, Twitter, REST, Email, …
•Identify Mime Types, Chunk Documents, Store to Vector Database
•Parse Documents - HTML, Markdown, PDF, Word, Excel, Powerpoint

RECORD-ORIENTED DATA WITH NIFI

•Record Readers - Avro, CSV, Grok, IPFIX, JSAN1, JSON, Parquet,
Scripted, Syslog5424, Syslog, WindowsEvent, XML
•Record Writers - Avro, CSV, FreeFromText, Json, Parquet, Scripted, XML
•Record Reader and Writer support referencing a schema registry for
retrieving schemas when necessary.
•Enable processors that accept any data format without having to worry about
the parsing and serialization logic.
•Allows us to keep FlowFiles larger, each consisting of multiple records, which
results in far better performance.

RESOURCES AND WRAP-UP
https://www.linkedin.com/in/timothyspann/

Getting Started
https://quickstarts.snowflake.com/guide/analyze_pdf_invoices_snowpark_python_java/index.ht
ml#0
https://medium.com/@tspann/utilizing-multiple-vectors-and-advanced-search-data-model-desig
n-for-city-data-705d68d8daf2
https://medium.com/cloudera-inc/real-time-in-boston-part-1-0f92d7da3496
https://medium.com/cloudera-inc/boston-wheres-my-bus-llm-streaming-to-the-rescue-586dfd01
9237
https://medium.com/@tspann/real-time-irish-transit-analytics-ea76164c9595
https://medium.com/cloudera-inc/streaming-street-cams-to-yolo-v8-with-python-and-nifi-to-mini
o-s3-3277e73723ce
https://medium.com/cloudera-inc/nyc-traffic-are-you-kidding-me-6d3fa853903b
https://medium.com/cloudera-inc/subways-and-transit-updates-in-real-time-30c104c359ef
https://medium.com/cloudera-inc/transit-in-sao-paulo-brasil-flank-style-eaec6753cc63

Deep Dives
https://medium.com/@tspann/harnessing-the-power-of-nifi-building-a-seamless-flow-to-ingest-p
m2-5-90246393fcab

https://medium.com/cloudera-inc/wildfires-air-quality-time-to-fire-up-the-sensors-and-start-flanking-12e
a0ba33f63

https://medium.com/@tspann/building-a-travel-advisory-app-with-apache-nifi-in-k8-969b44c849
58

https://medium.com/cloudera-inc/watching-airport-traffic-in-real-time-32c522a6e386