Warp10: collect, store and manipulate sensor data - BreizhCamp - 2016 03-24

LostInBrittany 860 views 85 slides Jun 22, 2016
Slide 1
Slide 1 of 85
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79
Slide 80
80
Slide 81
81
Slide 82
82
Slide 83
83
Slide 84
84
Slide 85
85

About This Presentation

Collecting, storing and analysing sensor data is not so simple. You often have to rebuild a complete stack if commonly used tools likes OpenTSDB or InfluxData does not fit to your needs.

At the end you need to solve 3 major problems:
* Performance: sensor data can require a high level of ingestion ...


Slide Content

@FinistSeb @LostInBrittany#BzhCmp #Warp10
Warp 10:
Collect, store and manipulate sensor data
Horacio Gonzalez Sébastien Lambour

@FinistSeb @LostInBrittany#BzhCmp #Warp10
Horacio Gonzalez
@LostInBrittany

Cityzen Data
Spaniard lost in Brittany,
developer, dreamer and all-
around geek

@FinistSeb @LostInBrittany#BzhCmp #Warp10
Sébastien Lambour
@FinistSeb

Cityzen Data
Runner, 2 Kids,
Geek, Handyman,
Polyglot JVM Developer

@FinistSeb @LostInBrittany#BzhCmp #Warp10
Warp 10

@FinistSeb @LostInBrittany#BzhCmp #Warp10
Introduction
Geo-Time Series
TM
Image: Spacetime distorsions

@FinistSeb @LostInBrittany#BzhCmp #Warp10
Time Series
Image: Mike Bostock

@FinistSeb @LostInBrittany#BzhCmp #Warp10
Time series storage and analysis
Image: Hamza Fessi and ABC Bourse
Not suited for your vanilla SQL RDBMS
One simple example: moving averages...

@FinistSeb @LostInBrittany#BzhCmp #Warp10
Geo-Time Series
TM

Image: AIS Vessel Tracking

@FinistSeb @LostInBrittany#BzhCmp #Warp10
Geo-Time Series

@FinistSeb @LostInBrittany#BzhCmp #Warp10
Geo-Time Series and the IoT
Image: LinkedIn

@FinistSeb @LostInBrittany#BzhCmp #Warp10
IoT means talking thing
How fast are they talking?

@FinistSeb @LostInBrittany#BzhCmp #Warp10
Internet of very introverted Things
Long range transmissions

@FinistSeb @LostInBrittany#BzhCmp #Warp10
Internet of introverted Things
Personal Area Network

@FinistSeb @LostInBrittany#BzhCmp #Warp10
Internet of shy Things
Local Area Network
Cellular Networks

@FinistSeb @LostInBrittany#BzhCmp #Warp10
Lots of shy thing generate a huge lot of data
Image: Universal Studios

@FinistSeb @LostInBrittany#BzhCmp #Warp10
Internet of chatty Things
10 000 Hz
670 000 sensors
20 000 metrics
per second

@FinistSeb @LostInBrittany#BzhCmp #Warp10
Internet of garrulous Things
Image: Google
Millions of
metrics per second

@FinistSeb @LostInBrittany#BzhCmp #Warp10
Warp 10 : A software platform for IoT
Warp 10 is a software platform that
●Ingests and stores data
●Manipulates and analyzes data
●Is dedicated to data from sensors, meters, IoT and any real or
virtual probe

@FinistSeb @LostInBrittany#BzhCmp #Warp10
Warp 10 General Synoptic
Stockage
Architecture
Language,
Functions,
Algorithms
Application
access
Vizualisation
Real
Time

@FinistSeb @LostInBrittany#BzhCmp #Warp10
#collect
How do you get these metrics?

Image: Games Radar

@FinistSeb @LostInBrittany#BzhCmp #Warp10
Using our own Sensision agent

@FinistSeb @LostInBrittany#BzhCmp #Warp10
Using our own Sensision agent
With queue forwarder

@FinistSeb @LostInBrittany#BzhCmp #Warp10
Using plugins for other collecting systems

@FinistSeb @LostInBrittany#BzhCmp #Warp10
Or simply pushing data directly

@FinistSeb @LostInBrittany#BzhCmp #Warp10
Choosing an input format

@FinistSeb @LostInBrittany#BzhCmp #Warp10
XML? JSON?
139 bytes 108 bytes

@FinistSeb @LostInBrittany#BzhCmp #Warp10
Warp 10 GTS Input Format
57 bytes

But size isn't the most important reason
parsing time is way more important

XML or even JSON parsing is slow and costly
Warp 10 GTS input format isn't

@FinistSeb @LostInBrittany#BzhCmp #Warp10
timestamp (us by default)

latitude:longitude (WGS84)

elevation (millimeters)

classname*

labels (key=value)

value* (long, double, boolean or string)
* mandatory fields
Warp 10 GTS Input Format

@FinistSeb @LostInBrittany#BzhCmp #Warp10
#store
From tiny to huge

Image: Games Radar

@FinistSeb @LostInBrittany#BzhCmp #Warp10
Warp 10 on Raspberry Pi B+
1 000 datapoints per second

@FinistSeb @LostInBrittany#BzhCmp #Warp10
Warp 10 on Raspberry Pi 2 B
3 000 datapoints per second

@FinistSeb @LostInBrittany#BzhCmp #Warp10
Warp 10 on a modern server
120 000 datapoints per second

@FinistSeb @LostInBrittany#BzhCmp #Warp10
Warp 10 on a cluster
3 millions of datapoints per second
(our current record on input traffic)

@FinistSeb @LostInBrittany#BzhCmp #Warp10
#analyse
From tiny to huge

Image: Amazon

@FinistSeb @LostInBrittany#BzhCmp #Warp10
Many time-series solutions
TSAR

@FinistSeb @LostInBrittany#BzhCmp #Warp10
But they are only stores...
Fetching data is only the tip of the iceberg

@FinistSeb @LostInBrittany#BzhCmp #Warp10
Analysing the data
High level analysis must be done elsewhere

@FinistSeb @LostInBrittany#BzhCmp #Warp10
Algorithms are resource hungry

@FinistSeb @LostInBrittany#BzhCmp #Warp10
Your computer is not a datacenter

@FinistSeb @LostInBrittany#BzhCmp #Warp10
Manipulating GTS
To be scalable, analysis must be done in
Warp 10 platform, not in user's computer

@FinistSeb @LostInBrittany#BzhCmp #Warp10
A true GTS analysis toolbox
○Hundreds of functions
○Manipulation frameworks
○Analysis workflow
Manipulating GTS

@FinistSeb @LostInBrittany#BzhCmp #Warp10
GTS manipulation
Why not a simple REST API?
●One endpoint by function?
●How to chain a workflow analysis?







REST API not suitable for
complex manipulations

@FinistSeb @LostInBrittany#BzhCmp #Warp10
GTS manipulation
Why not a SQL dialect?
●How do you do a simple moving average in SQL?
●How do you geo-time fencing in SQL?








SQL is not adapted to (G)TS analysis!

@FinistSeb @LostInBrittany#BzhCmp #Warp10
GTS manipulation language
Our solution: a GTS manipulation language







WarpScript

@FinistSeb @LostInBrittany#BzhCmp #Warp10
A stack based language

@FinistSeb @LostInBrittany#BzhCmp #Warp10
WarpScript
Non-compiled Optimized functions,
fast execution

@FinistSeb @LostInBrittany#BzhCmp #Warp10
Basic operations

@FinistSeb @LostInBrittany#BzhCmp #Warp10
Five frameworks




@FinistSeb @LostInBrittany#BzhCmp #Warp10
More than 500 functions

@FinistSeb @LostInBrittany#BzhCmp #Warp10
Time series functions

@FinistSeb @LostInBrittany#BzhCmp #Warp10
Time series functions

@FinistSeb @LostInBrittany#BzhCmp #Warp10
Geo-Time Series functions
Geo mapping (WKT)

Horizontal & vertical speed

Horizontal & vertical distance

Haversine
...

@FinistSeb @LostInBrittany#BzhCmp #Warp10
Quantum IDE

@FinistSeb @LostInBrittany#BzhCmp #Warp10
Enough teasing...

@FinistSeb @LostInBrittany#BzhCmp #Warp10
Fuel prices data
16 297 448 metrics


11 379 fuel stations


42 885 Geo Time Series

@FinistSeb @LostInBrittany#BzhCmp #Warp10
Basic analysis
Average diesel fuel prices in France
since 2007
Image: LEGO Ideas

@FinistSeb @LostInBrittany#BzhCmp #Warp10
First Fetch Data (SQL vs WarpScript )

@FinistSeb @LostInBrittany#BzhCmp #Warp10
FETCH gives us a GTS list

@FinistSeb @LostInBrittany#BzhCmp #Warp10
FETCH gives us a GTS list
Timestamp
(microseconds since epoch)

@FinistSeb @LostInBrittany#BzhCmp #Warp10
FETCH gives us a GTS list
Location
(latitude, longitude)

@FinistSeb @LostInBrittany#BzhCmp #Warp10
FETCH gives us a GTS list
Value

@FinistSeb @LostInBrittany#BzhCmp #Warp10
Calculate the average
Using Groovy:

@FinistSeb @LostInBrittany#BzhCmp #Warp10
Calculate the average with WarpScript
1- Calculate the mean price by station

@FinistSeb @LostInBrittany#BzhCmp #Warp10
Calculate the average with WarpScript
BUCKETIZE framework
Put the data of a GTS into regularly spaced buckets

@FinistSeb @LostInBrittany#BzhCmp #Warp10
Calculate the average with WarpScript

@FinistSeb @LostInBrittany#BzhCmp #Warp10
Calculate the average with WarpScript
2- Reduce to get the global average

@FinistSeb @LostInBrittany#BzhCmp #Warp10
Calculate the average with WarpScript
REDUCE framework
Apply a function on a set of GTS tick by tick

@FinistSeb @LostInBrittany#BzhCmp #Warp10





Too verbose? Write it differently

@FinistSeb @LostInBrittany#BzhCmp #Warp10






Even more concise

@FinistSeb @LostInBrittany#BzhCmp #Warp10
Basic analysis
Mean of the last available
diesel fuel prices in France
Image: LEGO Ideas

@FinistSeb @LostInBrittany#BzhCmp #Warp10
Fetching Data (SQL vs WarpScript )

@FinistSeb @LostInBrittany#BzhCmp #Warp10
FETCH gives us a GTS list

@FinistSeb @LostInBrittany#BzhCmp #Warp10
Mean of those last prices



align ticks with BUCKETIZE framework

compute the average with REDUCE

@FinistSeb @LostInBrittany#BzhCmp #Warp10
Geo-time analysis
Find the cheapest fuel station near here
48.115434, -1.636877

@FinistSeb @LostInBrittany#BzhCmp #Warp10




WKT: Well-known text geometry

@FinistSeb @LostInBrittany#BzhCmp #Warp10







WKT in WarpScript

@FinistSeb @LostInBrittany#BzhCmp #Warp10
Geo-filtering points of GTS

@FinistSeb @LostInBrittany#BzhCmp #Warp10
Geo-filtering points of GTS
MAPPER framework
Apply a function on values of a GTS
that fall into a sliding window

@FinistSeb @LostInBrittany#BzhCmp #Warp10
The stations near my position

@FinistSeb @LostInBrittany#BzhCmp #Warp10
There can only be one

@FinistSeb @LostInBrittany#BzhCmp #Warp10
And this is only the surface
Possibilities are endless

@FinistSeb @LostInBrittany#BzhCmp #Warp10
Think differently
Geo-Time Series are everywhere

@FinistSeb @LostInBrittany#BzhCmp #Warp10
Warp 10 platform and tools

@FinistSeb @LostInBrittany#BzhCmp #Warp10
Everything is on GitHub
https://github.com/cityzendata/

@FinistSeb @LostInBrittany#BzhCmp #Warp10
Thank you !