Time Series to Vectors: Leveraging InfluxDB and Milvus for Similarity Search

chloewilliams62 189 views 52 slides Oct 17, 2024

Slide 1 of 52

About This Presentation

In this webinar, we’ll explain the powerful combination of time series data and vector similarity search to revolutionize urban traffic management. Learn how to transform raw sensor data from InfluxDB into meaningful vectors, enabling advanced pattern recognition and anomaly detection using Milvus...

Size: 4.78 MB

Language: en

Added: Oct 17, 2024

Slides: 52 pages

Slide Content

| © Copyright 2023, InﬂuxData1
Time Series to Vectors:
Leveraging InﬂuxDB and
Milvus for Similarity Search
Anais Dotis Georgiou
October 2024

| © Copyright 2023, InﬂuxData22
Anais Dotis-Georgiou
Developer Advocate
LinkedIn

| © Copyright 2023, InﬂuxData3 | © Copyright 2023, InﬂuxData3
Agenda
●Introduction to InﬂuxDB and Time Series Databases
●TSDB vs Vector Databases: Apples to Oranges
●Projects you can try!
●Demo: Leveraging InﬂuxDB and Milvus for Similarity Search
for Time Series
●Use Cases
●(Time Permitting) Tools for data processing and ML tasks
with InﬂuxDB

| © Copyright 2023, InﬂuxData4
Introduction to InﬂuxDB and
Time Series Databases

| © Copyright 2023, InﬂuxData5
A Critical Component of Modern Data
Pipelines
Time Series
Data

| © Copyright 2023, InﬂuxData6
The age of instrumentation
Instrumentation
of the virtual world
(e.g. DevOps)
Sensors
in the physical world
(e.g. IoT)

| © Copyright 2023, InﬂuxData7
Time Series Data Types
Metrics
Events
Measurements at regular
time intervals
Measurements at irregular
time intervals

| © Copyright 2023, InﬂuxData8
Time series in every application
Infrastructure & data sources
Consumer & Industrial IoT Software Infrastructure
Renewable
&
alternative
energy
systems
Manufacturin
g & industrial
platforms
Fleet
management
& telematics
Real-time Applications
Developer
Tools
& APIs
Kubernete
s
(K8s)
DevOps
Monitoring
Gaming
Applications
Fintech
Applications
Network
Monitoring
TIME SERIES DATA

| © Copyright 2023, InﬂuxData9
Rise of time series as a category
TIME SERIESRELATIONAL DOCUMENT SEARCH
•Distributed
search
•Logs
•Geo
•High
throughput
•Large
document
•Orders
•Customers
•Records
•Events, metrics, time stamped
•for IoT, analytics, cloud native
Time series is fastest
growing data category by far
Time series
All others
source: DB Engines

| © Copyright 2023, InﬂuxData10
Time Series Databases
Time Series
Data
High write
throughput
Eﬃcient
Queries Over
Time Ranges
Scalability
and
Performance

| © Copyright 2023, InﬂuxData11
InﬂuxDB 3.0

| © Copyright 2023, InﬂuxData12
Vector Databases

| © Copyright 2023, InﬂuxData13
New kid on the blog: Vector Databases

| © Copyright 2023, InﬂuxData14
TSDB vs Vector Databases

| © Copyright 2023, InﬂuxData15
TSDB Vector
Use Cases:
•Monitoring
•IoT
•Predictive Maintenance

Advantages:
•Optimized for Time Series Data
•Time-Based Aggregations
•Fast inserts and Queries
Use Cases:
•Similarity Search
•Machine Learning and AI

Advantages:
•Eﬃciency in High-Dimensional Data
•Similarity Searches
•Support for Complex Data Types

| © Copyright 2023, InﬂuxData16
TSDB Vector

ML with Vector DBs:
•Similarity Search
•Clustering
•Anomaly Detection
•Nearest Neighbor Classiﬁcation

ML with TS DBs:
•Forecasting
•Time Series Classiﬁcation
•Anomaly Detection
•Regression Analysis

| © Copyright 2023, InﬂuxData18
Querying Programmatically via Flight
from influxdb_client_3 import InfluxDBClient3

host = “eu-central-1-1.aws.cloud2.influxdata.com”
org="6a841c0c08328fb1"
token = “”
database = “database”

client = InfluxDBClient3(
token=token,
host=host,
org=org)

sql = '''SELECT * FROM table'''
df = client.query(query=sql, language='sql',
mode='pandas')
print(df)
Library Import
Initialization
Query

| © Copyright 2023, InﬂuxData24
?????? Packing Co is having recurring issues
with one of their packaging machines.

?????? Unexpectedly, 1 of the machines will enter
a failing state which requires a manual
reset by an engineer.

?????? The Plant Manager has advised, when
running normally all machine sensors will
follow similar output patterns. If a
machine is at fault these will ﬂuctuate
abnormally.

?????? How can we use HiveMQ, HuggingFace and
InﬂuxDB to solve this?
?????? Packing Co — Anomaly Detection

| © Copyright 2023, InﬂuxData28
Artiﬁcial Neural Networks - Autoencoder
i/p o/pBottleneck
Encoder
Decoder
inputs = Input(shape=(input_dim,))
sequences = SequenceLayer(timesteps)(inputs)
inputs = Input(shape=(timesteps, input_dim))
encoded = LSTM(16, activation='relu', return_sequences=True)(inputs)
encoded = LSTM(4, activation='relu', return_sequences=False)(encoded)
decoded = RepeatVector(timesteps)(encoded)
decoded = LSTM(4, activation='relu', return_sequences=True)(decoded)
decoded = LSTM(16, activation='relu', return_sequences=True)(decoded)
decoded = TimeDistributed(Dense(input_dim))(decoded)

| © Copyright 2023, InﬂuxData32
Overview of Tasks in InﬂuxDB 3.0
•InﬂuxDB 3.0 favors interoperability with other ETL and
stream processing tools instead of locking users into
InﬂuxDB speciﬁc task tooling.
•Users have access to a wide variety of streaming and task
tools, so they can ﬁnd the one that works best for them.
•Having more choices requires greater initial
decision-making.

| © Copyright 2023, InﬂuxData35
Advantages to Mage
•Open Source, easy to use.
•Mage features:
•Orchestration: Schedule and manage data pipelines with
observability.
•Notebook editor: Interactive Python, SQL, & R editor for coding data
pipelines.
•Data integration: Synchronize data from 3rd party sources to your
internal destinations.
•Streaming: Ingest and transform real-time data.
•dbt: Build, run, and manage your dbt models with Mage.
•Clear documentation on how to deploy on AWS, Azure,
DigitalOcean, and GCP with Terraform and Helm Charts.

| © Copyright 2023, InﬂuxData36
Resources for Mage and InﬂuxDB 3.0
•Mage.ai for Tasks with InﬂuxDB: A blog post highlighting how to
set up a simple downsampling task with Mage and InﬂuxDB 3.0.
•Mage for Anomaly detection with InﬂuxDB and Half-space
Trees: A blog post on performing anomaly detection with Mage and
InﬂuxDB 3.0.
•ETL Made Easy: Best Practices for Using InﬂuxDB and Mage.ai:
An on-demand webinar on best practices for using Mage as an ETL
tool with InﬂuxDB. Includes a demo on anomaly detection with Mage
and InﬂuxDB 3.0.
•Mage Documentation
•Mage_Demo: A containerized repo highlighting the anomaly
detection use case.

| © Copyright 2023, InﬂuxData41
Advantages to Fargate
•Serverless Simplicity: Fargate abstracts away the underlying
infrastructure, allowing developers to deploy containers
without worrying about provisioning, scaling, or managing EC2
instances
•Cost Eﬃciency: Fargate charges users based on the resources
consumed by the containers, providing cost savings by
eliminating the need to maintain idle EC2 instances.

| © Copyright 2023, InﬂuxData42
Resources for Fargate and InﬂuxDB
•ricks-downsampler: a repo that contains a containerized
downsampler complete with scheduling options and some
monitoring.
•Saving AWS Costs by using Fargate Scheduling: a blog post
that compares the costs associated with:
1.Using Fargate to continuously run container that had built in
scheduling and some monitoring for a downsampling task with
InﬂuxDB.
2.Using Cloudwatch to schedule the runs. This was the more
expensive option for this use case where the runs were periodic and
on a consistent data load.

| © Copyright 2023, InﬂuxData44
ByteWax
Bytewax is a framework designed for building data processing
pipelines with a focus on streaming and stateful computations.
It is particularly suited for tasks that involve real-time data
processing, such as ETL (extract, transform, load) pipelines,
event-driven architectures, and continuous analytics.

| © Copyright 2023, InﬂuxData45
Advantages to ByteWax
•Stateful Computations
•Parallel and Distributed Execution
•Python Integration
•Windowing and Time-Based Operations
•Connectors and Integrations
•Event-Driven Architecture
•InﬂuxDB Source and Sink Connectors

| © Copyright 2023, InﬂuxData46
Kafka and Faust
Kafka and Faust are both tools used for building data pipelines
and stream processing systems. They each have unique
features and advantages that make them suitable for ETL
(Extract, Transform, Load) tasks.

| © Copyright 2023, InﬂuxData47
Advantages to Kafka and Faust
Kafka
•High Throughput and Low Latency
•Scalability
•Durability and Reliability
•Fault Tolerance
•Pub/Sub Messaging
Faust
•Pythonic API
•Stream Processing
•Ease of Use and Integration with Kafka

| © Copyright 2023, InﬂuxData49
Join the InﬂuxDB Community
Sign up for Free
Inﬂuxdata.com/cloud

Via cloud marketplace
Learn
Blogs
Documentation
InﬂuxDB University
Community
https://influxdbu.com/
https://influxcommunity.slack.com /
https://community.influxdata.com/
https://www.influxdata.com/blog/
https://docs.influxdata.com/

| © Copyright 2023, InﬂuxData51
Get Help + Resources!
51
Website: https://www.inﬂuxdata.com/
Get started with 3.0: https://cloud2.inﬂuxdata.com/signup
Forums: community.inﬂuxdata.com
Docs: docs.inﬂuxdata.com
Blogs: inﬂuxdata.com/blog
InﬂuxDB University: inﬂuxdata.com/university

Time Series to Vectors: Leveraging InfluxDB and Milvus for Similarity Search

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Time Series to Vectors: Leveraging InfluxDB and Milvus for Similarity Search

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Slide 38

Slide 39

Slide 40

Slide 41

Slide 42

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47

Slide 48

Slide 49

Slide 50

Slide 51

Slide 52

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

MGV Residential Design projects for different clients, including a New Mexico Adobe project-1-.pdf

EUNITED_Advocacy and Public Engagement through Visual Media

DESIGN THINKINGGG PPT 2 TOPIC IDEATION.pptx

DESIGN THINKING CHAPTER 1 PPTT PPT 1.pptx

Hinduism and Its History - PowerPoint Slides.pptx

Service Attributes of Manufactured Parts.pptx