Model-driven Telemetry: The Foundation of Big Data Analytics

CiscoCanada 2,001 views 39 slides Jan 28, 2018
Slide 1
Slide 1 of 39
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39

About This Presentation

Model-driven Telemetry: The Foundation of Big Data Analytics


Slide Content

© 2016 Cisco and/or its affiliates. All rights reserved. 1
Model-Driven Telemetry
David Carmona
Solutions Architect –Cisco Global Service Provider
January 30th, 2018
Cisco
Connect

© 2016 Cisco and/or its affiliates. All rights reserved. 2
Origins of Telemetry
PING –Term minted during WWII. Sonar signals one ship emits
to measure its distance from another waterborne body. Packet
Internet Groper used in computers/networks to measure RT
delay and packet loss between to devices in the network.
From the Greek TELE, meaning remote, and METRON,
meaning measure, and translating literally as measure
remotely.
Telemetry applications: Military, Medical, Manufacturing, Aero-
Space, Financial, Oil and Gas, Modern Networks, etc.

© 2016 Cisco and/or its affiliates. All rights reserved. 3
Use Cases
•Network Health
•Troubleshooting / Remediation
•SLAs, Performance Tuning
•Capacity Planning
•Product Management
Trends
•Centralized / Software-defined
•Speed
•Scale
•Closed-loop
automation/orchestration/provisioning
We Need More Data
Capabilities

© 2016 Cisco and/or its affiliates. All rights reserved. 4
Be sure to make right
measurements before
implementing automation
@
What Automation Without Visibility Looks Like
TELEMETRY PROVIDES
NETWORK VISIBILITY

© 2016 Cisco and/or its affiliates. All rights reserved. 5
What is Streaming Telemetry
“How can we get as much data off the router as fast as possible in a way that makes it
easy to use?”
•Internal data structures available to the outside
•Get ride of inefficient polling mechanism
•Data needed to be normalized for Big data tools (Jason and GPB)

6© 2016 Cisco and/or its affiliates. All rights reserved.
Current
challenges

© 2016 Cisco and/or its affiliates. All rights reserved. 7
syslog
SNMP
CLI
Too Slow
Incomplete
Network-Specific
Hard to Operationalize
Network Visibility with Traditional Telemetry Methods
Netflow
Network Visibility is Hard!!!

© 2016 Cisco and/or its affiliates. All rights reserved. 8
What Happens When You Push SNMP Too Hard
•10 second poll / push
•3 pollers/ telemetry receivers
•30 minute measurement intervals
•288 100Gig E Interfaces (Line Rate)
•SNMP: IF-MIB (query by row)
MDT (Sec) –avg. time

© 2016 Cisco and/or its affiliates. All rights reserved. 9
Source: Google @ Bay Area OpenDaylightMeetup 06/16

10© 2016 Cisco and/or its affiliates. All rights reserved.
Telemetry
fundamentals

© 2016 Cisco and/or its affiliates. All rights reserved. 11
Three Enablers for Telemetry
Push Not Pull
Analytics-Ready Data
Data-Model Driven

© 2016 Cisco and/or its affiliates. All rights reserved. 12
“OSI model” of Telemetry
Data store
Data model
Producer
Exporter
Collector
Native (raw) data inside a router’s
database
Raw data mapped to a model
(YANG native, OpenConfig, etc)
Sending requested data in model format
to the “Exporter” at defined intervals
Encoding and delivering data to the
collector(s) destination(s)
Information collection for processing
(e.g., data monitoring, automation, analytics)
“Data”“Layer”
Telemetry end
-to
-end

© 2016 Cisco and/or its affiliates. All rights reserved. 13
YANG Is A Modeling Language
module ietf-interfaces {
import ietf-yang-types {
prefix yang;
}
container interfaces {
list interface {
key "name";
leaf name {
type string;
}
leaf enabled {
type boolean;
default "true";
}

Edited
for
Brevity
Self-contained top-level hierarchy of nodes
Import or define data types
Leaf nodes for simple data
Lists for sequence of entries
Containers group related nodes
Other YANG Features
•RO or RW
•Optional nodes
•Choice
•Augment
•When
•Arbitrary XML
•RPC
•etc

© 2016 Cisco and/or its affiliates. All rights reserved. 14
Models are Available in Github
You
Should
Do
This
•Telemetry only cares about operational (*-oper.yang)
models.
•143 oper YANG models published for XR 6.1.1
•151 oper YANG are for XR 6.1.2
•177 oper YANG are for XR 6.2.1
•180 oper YANG for XR 6.2.2
•198 oper YANG for XR 6.3.1https://github.com/YangModels/yang/tree/master/vendor/cisco/xr

© 2016 Cisco and/or its affiliates. All rights reserved. 15
“OSI model” of Telemetry
Data store
Data model
Producer
Exporter
Collector
Native (raw) data inside a router’s
database
Raw data mapped to a model
(YANG native, OpenConfig, etc)
Sending requested data in model format
to the “Exporter” at defined intervals
Encoding and delivering data to the
collector(s) destination(s)
Information collection for processing
(e.d., data monitoring, automation, analytics)
“Data”“Layer”
Telemetry end
-to
-end

© 2016 Cisco and/or its affiliates. All rights reserved. 16
Finding the Data You Want To Stream
$ pyang-f tree Cisco-IOS-XR-infra-statsd-oper.yang
--tree-path infra-statistics/interfaces/interface/latest/generic-counters
telemetry model-driven
sensor-group SGROUP1
sensor-path Cisco-IOS-XR-infra-statsd-oper:infra-
statistics/interfaces/interface/latest/generic-counters

© 2016 Cisco and/or its affiliates. All rights reserved. 17
What Will Be Pushed With That Config
{
"Timestamp": 1480547974706,
"Keys": {
"interface-name": "MgmtEth0/RP0/CPU0/0"
},
"Content": {
"applique": 0,
"availability-flag": 0,
"broadcast-packets-received": 25035,
"broadcast-packets-sent": 0,
"bytes-received": 165321050,
"bytes-sent": 233917498,
"carrier-transitions": 3,
"crc-errors": 0,
"framing-errors-received": 0,
"giant-packets-received": 0,
"input-aborts": 0,
"input-drops": 62,
"input-errors": 0,
"input-ignored-packets": 0,
"input-overruns": 0,
"input-queue-drops": 0,
"last-data-time": 1480547974,
"last-discontinuity-time": 1479244159,
"multicast-packets-received": 457,
"multicast-packets-sent": 0,
"output-buffer-failures": 0,
"output-buffers-swapped-out": 0,
"output-drops": 0,
"output-errors": 104,
"output-queue-drops": 0,
"output-underruns": 0,
"packets-received": 373156,
"packets-sent": 311583,
"parity-packets-received": 0,
"resets": 0,
"runt-packets-received": 0,
"seconds-since-last-clear-counters": 0,
"seconds-since-packet-received": 0,
"seconds-since-packet-sent": 0,
"throttled-packets-received": 0,
"unknown-protocol-packets-received": 0
}
Repeated for all interfaces

© 2016 Cisco and/or its affiliates. All rights reserved. 18
“OSI model” of Telemetry
Data store
Data model
Producer
Exporter
Collector
Native (raw) data inside a router’s
database
Raw data mapped to a model
(YANG native, OpenConfig, etc)
Sending requested data in model format
to the “Exporter” at defined intervals
Encoding and delivering data to the
collector(s) destination(s)
Information collection for processing
(e.d., data monitoring, automation, analytics)
“Data”“Layer”
Telemetry end
-to
-end

© 2016 Cisco and/or its affiliates. All rights reserved. 19
Configuring Destination
telemetry model-driven
destination-group DGROUP
address family ipv4 192.168.1.1 port 2104
----and/or ----
address family ipv6 2001:db8::1 port 2104
encoding self-describing-gpb
protocol tcp
Specify where you want to send your data
Specify how you want your data to look like
Specify how you want your data to be delivered

© 2016 Cisco and/or its affiliates. All rights reserved. 20
Basic Concept: Encoding
Encoding (or “serialization”) translates data (objects, state) into a format that
can be transmitted across the network. When the receiver decodes (“de-
serializes”) the data, it has an semantically identical copy of the original data.
DATA
DATA
“Decode”
“Encode”
IOS XR platforms
Encodings
•Compact GPB
•Key-Value GPB
•JSON (6.3.1)

© 2016 Cisco and/or its affiliates. All rights reserved. 21
GPB Encoding
Design Goals
•Simplicity
•Performance
•Forward/Backward
Compatibility
Non-Goals
•Human-Readable
•Self-Describing
•Text-based
Google Protocol Buffers (GPB)
Call them
“protobufs”
for short
“Protocol buffers are Google's language-neutral, platform-
neutral, extensible mechanism for serializing structured data
–think XML, but smaller, faster, and simpler.”

© 2016 Cisco and/or its affiliates. All rights reserved. 22
Telemetry Has Two GPB Encoding Options
data_gpb{
row {
timestamp: 1485794640469
keys: "\n\026GigabitEthernet0/0/0/0"
content:
"\220\003\010\230\003\001\240\003\002\250\003\000\260\003\
000\270\003\000\300\003\000\310\003\000\320\003\300\204=\3
30\003\000\340\003\000\350\003\000\360\003\377\001"
}
2X faster
Operationally more complex (but not
relative to SNMP!)
data_gpbkv{
timestamp: 1485793813389
fields {
name: "keys"
fields { name: "interface-name" string_value:
"GigabitEthernet0/0/0/0" }
}
fields {
name: "content"
fields { name: "input-data-rate" uint64_value: 8 }
fields { name: "input-packet-rate" uint64_value: 1 }
<<< 9 lines are skipped >>>
fields { name: "input-load" uint32_value: 0 }
fields { name: "reliability" uint32_value: 255 }
}
}
...
3X larger
Native models: still need heuristics for key
names
GPB –“compact”GPB –“self-describing”

© 2016 Cisco and/or its affiliates. All rights reserved. 23
Dial-Out
•TCP & gRPC(from 6.1.1)
•UDP (from 6.2.1)
Dial-In
•gRPConly (from 6.1.1)
Transport Options
Collector
Data
SYN
SYN-ACK
ACK Collector
Data
SYN
SYN-ACK
ACK

© 2016 Cisco and/or its affiliates. All rights reserved. 24
gRPC: Like REST But Different
Runs over HTTP/2
Optimize for page load time
Server push, header compression, multiplexing, TLS
RFC 7540 (May 2015)
Preserves most HTTP1.1 syntax
Defines Services (“RPCs”)
Encodes Using Google Protocol
Buffers (“GPB” or “protobufs”)
Services and Messages
Auto-generate code in many languageshttp://www.grpc.io/docs/#hello-grpc

© 2016 Cisco and/or its affiliates. All rights reserved. 25
“OSI model” of Telemetry
Data store
Data model
Producer
Exporter
Collector
Native (raw) data inside a router’s
database
Raw data mapped to a model
(YANG native, OpenConfig, etc)
Sending requested data in model format
to the “Exporter” at defined intervals
Encoding and delivering data to the
collector(s) destination(s)
Information collection for processing
(e.d., data monitoring, automation, analytics)
“Data”“Layer”
Telemetry end
-to
-end

© 2016 Cisco and/or its affiliates. All rights reserved. 26
A Telemetry Subscription
telemetry model-driven
subscription Sub1
sensor-group-id SGROUP1 sample-interval 30000
destination-id DGROUP1
*Omit Destination Group For gRPCDial-In

27© 2016 Cisco and/or its affiliates. All rights reserved.
Telemetry on
Cisco products

© 2016 Cisco and/or its affiliates. All rights reserved. 28
Cisco XR Telemetryoverview
Classic XR ASR9kEvolved XR ASR9kNCS5500NCS6k
MDT support6.1.16.1.16.1.16.1.3
Data models
YANG
(native, OC, IETF)
Link formodels
YANG
(native, OC, IETF)
Link formodels
YANG
(native, OC, IETF)
Link formodels
YANG
(native, OC, IETF)
Link formodels
Transport
(Control
protocols)
TCP (dial-out),
UDP (dial-out)*
gRPC(dial-in, dial-out),
TCP (dial-out),
UDP (dial-out)*
gRPC(dial-in, dial-out),
TCP (dial-out), UDP
(dial-out)*
TCP (dial-out), UDP
(dial-out)*
EncodingGPB /
GPB-KV / JSON**GPB / GPB-KV / JSON**GPB /
GPB-KV / JSON**
GPB /
GPB-KV / JSON**
CollectorsPipeline***Pipeline***Pipeline***Pipeline***
* UDP support from 6.2.1
** JSON support from 6.3.1
*** Open-sourced and ready to use: https://github.com/cisco/bigmuddy-network-telemetry-pipeline

© 2016 Cisco and/or its affiliates. All rights reserved. 29
NX OS IOS-XE
MDT support7.0(3)I6(1)16.6.1*
Data modelsData Management Engine, NX-API, YANG
(native, OC,IETF)
YANG (native**,IETF)
Link for models
Transport
(Controlprotocols)
gRPC*(dial-out), UDP**(dial-out), HTTP***
(dial-out)
Netconf (for YANG), GNMI
(16.8.1), gRPC(16.9.1)
Formats GPB/JSONXML, GPB (16.9.1)
CollectorsPipeline TBD
Min sample interval5 sec 1 sec
Max # of dial-out destinations5 TBD
Cisco NXOS/XE Telemetryhigh-leveloverview
*gRPCsupports GPB only *supported on Catalyst 3650/3850/9300/9500, ASR1000, ISR4000
**UDP from 7.0(3)I7(1), supports both, GPB and JSON **Native models are different from YANG models in XR
***HTTP suppors JSON only
https://www.cisco.com/c/en/us/td/docs/switches/datacenter/nexus9000/sw/7-x/programmability/guide/b_Cisco_Nexus_9000_Series_NX-
OS_Programmability_Guide_7x/b_Cisco_Nexus_9000_Series_NX-OS_Programmability_Guide_7x_chapter_011000.html
https://www.cisco.com/c/en/us/td/docs/ios-xml/ios/prog/configuration/166/b_166_programmability_cg/model_driven_telemetry.html

© 2016 Cisco and/or its affiliates. All rights reserved. 30
Event-Driven Telemetry overview
Model-Driven Telemetry
(cadence based)
100 interfaces UP / 0interfaces DOWN
100 interfaces UP / 0interfaces DOWN
100 interfaces UP / 0interfaces DOWN
99interfaces UP / 1interfaces DOWN
99interfaces UP / 1interfaces DOWN
99interfaces UP / 1interfaces DOWN
Event-Driven Telemetry
Router X
100 interfaces UP / 0interfaces DOWN
99interfaces UP / 1interfaces DOWN
Router X
Time Time
publish the event on
-change

© 2016 Cisco and/or its affiliates. All rights reserved. 31
Event-Driven Telemetry configuration
A configuration example:
telemetry model-driven
sensor-group interface
sensor-path Cisco-IOS-XR-pfi-im-cmd-oper:interfaces/interface-xr/interface
!
subscription interface
sensor-group-id interface sample-interval 0 ## ”0” means EDT
!
Please, go ahead
And tell us what you
want to have!
IOS XR 6.3.1 includes support for:
1.Interface changes
•Sensor path: Cisco-IOS-XR-pfi-im-cmd-oper:interfaces/interface-xr/interface
•Sensor path: Cisco-IOS-XR-ipv6-ma-oper:ipv6-network/nodes/node/interface-data/vrfs/vrf/global-briefs/global-brief (*)
2.RIB changes
•Sensor path: Cisco-IOS-XR-ip-rib-ipv4-oper:rib/vrfs/vrf/afs/af/safs/saf/ip-rib-route-table-names/ip-rib-route-table-name/routes/route
•Sensor path: Cisco-IOS-XR-ip-rib-ipv6-oper:ipv6-rib/vrfs/vrf/afs/af/safs/saf/ip-rib-route-table-names/ip-rib-route-table-
name/routes/route
3.Syslog updates
•Sensor path: Cisco-IOS-XR-infra-syslog-oper:syslog/messages/message
More to come soon !

32© 2016 Cisco and/or its affiliates. All rights reserved.
Collectors
and analytics
platforms

© 2016 Cisco and/or its affiliates. All rights reserved. 33
Kafka
Different Collection Models
Logstash
ElasticSearch
Kibana
Panda
BYO
CustomOpen Source, Customizable
Proprietary
or OS-
based
Commercial Stack
Prometheus /
InfluxDB
Grafana
Applications
Storage
Collection

© 2016 Cisco and/or its affiliates. All rights reserved. 34
Pipeline: An Open Source Collector
Pipeline
Kapacitor
Output to file, TSBD, Kafka…
Ingest, transform, filter
Self-monitoring, horizontally scalable

© 2016 Cisco and/or its affiliates. All rights reserved. 35
TSDB: systems requirements
Pipeline
Grafanauses TSDB for the work, server requirements are small
(250MB of RAM, 1CPU –the answer from the developer: https://tinyurl.com/ybdhexqu)
Prometheus doesn’t give exact requirements, but in many discussions shows 64GB of
RAM and and 32 CPU: https://tinyurl.com/yd7l38ed)
Pipeline is a lightweight program, written in Go (with concurrency). You can also start
several instances [on a single server] if needed.
InfluxDBrequirements:
https://tinyurl.com/yd9f3ntj
ServerCPU,
cores
RAM,
GBIOPS
Low load [5k writes, 100k series, 5
queries per second]2-4 2-4500
Moderate load [250k writes, 1m series,
25 queries per second]4-68-32500-1000
High load [>250k writes, >1m series,
>25 queries per second]8+32+1000+

© 2016 Cisco and/or its affiliates. All rights reserved. 36
Real life deployments
https://promcon.io/2017-munich/slides/monitoring-cloudflares-planet-scale-edge-network-with-prometheus.pdf

© 2016 Cisco and/or its affiliates. All rights reserved. 37
Key Takeaways
•Speed & Scale Require Visibility
•It’s Not Hard to Beat SNMP
•Data Models Are Your Friends
•A Big Data Platform Is In Your Future

© 2016 Cisco and/or its affiliates. All rights reserved. 38
Resources
Tutorials, Blogs, VoDs
https://xrdocs.github.io/telemetry/
http://blogs.cisco.com/sp/the-limits-of-snmp
http://blogs.cisco.com/sp/boring-is-the-new-awesome
http://blogs.cisco.com/sp/why-you-should-care-about-model-driven-telemetry
https://youtu.be/tIN8BjHwpNs(NANOG 67: 10 Lessons from Telemetry)
YANG
https://github.com/YangModels/yang/tree/master/vendor/cisco(Cisco YANG models)
http://blogs.cisco.com/getyourbuildon/yang-opensource-tools-for-data-modeling-driven-management
(YANG open source tools)
https://developer.cisco.com/site/ydk/(YDK intro)
Telemetry Tools :
https://github.com/cisco/bigmuddy-network-telemetry-pipeline
https://github.com/cisco/bigmuddy-network-telemetry-stacks
Demos and Lab
https://dcloud-cms.cisco.com/demo/mdt-ios-xr-611-v1(dCloud)
https://youtu.be/F_S9-ctNFe0(demo on NCS 5508)

Thank you.