Warp10: collect, store and manipulate sensor data - BreizhCamp - 2016 03-24
LostInBrittany
860 views
85 slides
Jun 22, 2016
Slide 1 of 85
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
About This Presentation
Collecting, storing and analysing sensor data is not so simple. You often have to rebuild a complete stack if commonly used tools likes OpenTSDB or InfluxData does not fit to your needs.
At the end you need to solve 3 major problems:
* Performance: sensor data can require a high level of ingestion ...
Collecting, storing and analysing sensor data is not so simple. You often have to rebuild a complete stack if commonly used tools likes OpenTSDB or InfluxData does not fit to your needs.
At the end you need to solve 3 major problems:
* Performance: sensor data can require a high level of ingestion (potentially millions of data points per second)
* Analyse: Time Series analysis implies a new paradigm not compatible with existing Query language likes SQL
* Security: because Internet of Things should not become Internet Of Breaches
Warp 10 has been created to solve this challenge:
* Deploying a standalone or distributed backend
* Collecting your Data with Web standards (HTTP, WebSocket) or also with Warp 10 Sensision collector daemon.
* Analysing data with WarpScript, a language dedicated to time series manipulation.
* Developing with web components tools for editing WarpScripts and Data visualisation
Size: 8.76 MB
Language: en
Added: Jun 22, 2016
Slides: 85 pages
Slide Content
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Warp 10:
Collect, store and manipulate sensor data
Horacio Gonzalez Sébastien Lambour
Cityzen Data
Runner, 2 Kids,
Geek, Handyman,
Polyglot JVM Developer
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Warp 10
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Introduction
Geo-Time Series
TM
Image: Spacetime distorsions
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Time Series
Image: Mike Bostock
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Time series storage and analysis
Image: Hamza Fessi and ABC Bourse
Not suited for your vanilla SQL RDBMS
One simple example: moving averages...
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Geo-Time Series
TM
Image: AIS Vessel Tracking
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Geo-Time Series
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Geo-Time Series and the IoT
Image: LinkedIn
@FinistSeb @LostInBrittany#BzhCmp #Warp10
IoT means talking thing
How fast are they talking?
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Internet of very introverted Things
Long range transmissions
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Internet of introverted Things
Personal Area Network
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Internet of shy Things
Local Area Network
Cellular Networks
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Lots of shy thing generate a huge lot of data
Image: Universal Studios
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Internet of chatty Things
10 000 Hz
670 000 sensors
20 000 metrics
per second
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Internet of garrulous Things
Image: Google
Millions of
metrics per second
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Warp 10 : A software platform for IoT
Warp 10 is a software platform that
●Ingests and stores data
●Manipulates and analyzes data
●Is dedicated to data from sensors, meters, IoT and any real or
virtual probe
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Warp 10 General Synoptic
Stockage
Architecture
Language,
Functions,
Algorithms
Application
access
Vizualisation
Real
Time
@FinistSeb @LostInBrittany#BzhCmp #Warp10
#collect
How do you get these metrics?
Image: Games Radar
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Using our own Sensision agent
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Using our own Sensision agent
With queue forwarder
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Using plugins for other collecting systems
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Or simply pushing data directly
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Choosing an input format
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Warp 10 GTS Input Format
57 bytes
But size isn't the most important reason
parsing time is way more important
XML or even JSON parsing is slow and costly
Warp 10 GTS input format isn't
@FinistSeb @LostInBrittany#BzhCmp #Warp10
timestamp (us by default)
latitude:longitude (WGS84)
elevation (millimeters)
classname*
labels (key=value)
value* (long, double, boolean or string)
* mandatory fields
Warp 10 GTS Input Format
@FinistSeb @LostInBrittany#BzhCmp #Warp10
#store
From tiny to huge
Image: Games Radar
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Warp 10 on Raspberry Pi B+
1 000 datapoints per second
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Warp 10 on Raspberry Pi 2 B
3 000 datapoints per second
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Warp 10 on a modern server
120 000 datapoints per second
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Warp 10 on a cluster
3 millions of datapoints per second
(our current record on input traffic)
@FinistSeb @LostInBrittany#BzhCmp #Warp10
#analyse
From tiny to huge
Image: Amazon
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Many time-series solutions
TSAR
@FinistSeb @LostInBrittany#BzhCmp #Warp10
But they are only stores...
Fetching data is only the tip of the iceberg
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Analysing the data
High level analysis must be done elsewhere
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Algorithms are resource hungry
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Your computer is not a datacenter
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Manipulating GTS
To be scalable, analysis must be done in
Warp 10 platform, not in user's computer
@FinistSeb @LostInBrittany#BzhCmp #Warp10
A true GTS analysis toolbox
○Hundreds of functions
○Manipulation frameworks
○Analysis workflow
Manipulating GTS
@FinistSeb @LostInBrittany#BzhCmp #Warp10
GTS manipulation
Why not a simple REST API?
●One endpoint by function?
●How to chain a workflow analysis?
REST API not suitable for
complex manipulations
@FinistSeb @LostInBrittany#BzhCmp #Warp10
GTS manipulation
Why not a SQL dialect?
●How do you do a simple moving average in SQL?
●How do you geo-time fencing in SQL?
SQL is not adapted to (G)TS analysis!
@FinistSeb @LostInBrittany#BzhCmp #Warp10
GTS manipulation language
Our solution: a GTS manipulation language
WarpScript
@FinistSeb @LostInBrittany#BzhCmp #Warp10
A stack based language
@FinistSeb @LostInBrittany#BzhCmp #Warp10
WarpScript
Non-compiled Optimized functions,
fast execution
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Fuel prices data
16 297 448 metrics
11 379 fuel stations
42 885 Geo Time Series
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Basic analysis
Average diesel fuel prices in France
since 2007
Image: LEGO Ideas
@FinistSeb @LostInBrittany#BzhCmp #Warp10
First Fetch Data (SQL vs WarpScript )
@FinistSeb @LostInBrittany#BzhCmp #Warp10
FETCH gives us a GTS list
@FinistSeb @LostInBrittany#BzhCmp #Warp10
FETCH gives us a GTS list
Timestamp
(microseconds since epoch)
@FinistSeb @LostInBrittany#BzhCmp #Warp10
FETCH gives us a GTS list
Location
(latitude, longitude)
@FinistSeb @LostInBrittany#BzhCmp #Warp10
FETCH gives us a GTS list
Value
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Calculate the average
Using Groovy:
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Calculate the average with WarpScript
1- Calculate the mean price by station
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Calculate the average with WarpScript
BUCKETIZE framework
Put the data of a GTS into regularly spaced buckets
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Calculate the average with WarpScript
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Calculate the average with WarpScript
2- Reduce to get the global average
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Calculate the average with WarpScript
REDUCE framework
Apply a function on a set of GTS tick by tick
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Too verbose? Write it differently
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Even more concise
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Basic analysis
Mean of the last available
diesel fuel prices in France
Image: LEGO Ideas
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Fetching Data (SQL vs WarpScript )
@FinistSeb @LostInBrittany#BzhCmp #Warp10
FETCH gives us a GTS list
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Mean of those last prices
align ticks with BUCKETIZE framework
compute the average with REDUCE
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Geo-time analysis
Find the cheapest fuel station near here
48.115434, -1.636877
@FinistSeb @LostInBrittany#BzhCmp #Warp10
WKT: Well-known text geometry
@FinistSeb @LostInBrittany#BzhCmp #Warp10
…
WKT in WarpScript
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Geo-filtering points of GTS
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Geo-filtering points of GTS
MAPPER framework
Apply a function on values of a GTS
that fall into a sliding window
@FinistSeb @LostInBrittany#BzhCmp #Warp10
The stations near my position
@FinistSeb @LostInBrittany#BzhCmp #Warp10
There can only be one
@FinistSeb @LostInBrittany#BzhCmp #Warp10
And this is only the surface
Possibilities are endless
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Think differently
Geo-Time Series are everywhere
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Warp 10 platform and tools
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Everything is on GitHub
https://github.com/cityzendata/
@FinistSeb @LostInBrittany#BzhCmp #Warp10
Thank you !