Monitoring and Managing Anomaly Detection on OpenShift.pdf

TosinAkinosho 212 views 23 slides Jun 13, 2024
Slide 1
Slide 1 of 23
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23

About This Presentation

Monitoring and Managing Anomaly Detection on OpenShift

Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and ...


Slide Content

Update confidential designator here
Version number here V00000
1


Monitoring and
Managing Anomaly
Detection on
OpenShift
Tosin Akinosho
Monitoring and Managing Anomaly Detection on OpenShift

Update confidential designator here
Version number here V00000
Project Overview and Purpose
2
▸Provide hands on tutorial and code for implementing anomaly detection
on edge devices
▸Enable real time identification of unusual behavior or failures on
resource-constrained IoT/edge devices
▸Cover end-to-end process from data collection and model training to
edge deployment and monitoring

Project Overview and Purpose

Update confidential designator here
Version number here V00000
Optional section marker
3
▸What is Anomaly Detection?
▸What is Edge (IoT)?
▸What is ArgoCD?
▸Deployment using ArgoCD for edge devices
▸What is Apache Kafka and S3?
▸Viewing Kafka messages in the data lake
▸What is Prometheus?
▸Monitoring application metrics with Prometheus
▸What is Camel K
▸Configuring Camel K integrations for data pipelines
▸What is a Jupyter notebook
▸Jupyter notebooks with code examples

Key Topics Covered

Update confidential designator here
Version number here V00000
What is Anomaly Detection?
4
Source:
Insert source data here
Insert source data here
What is Anomaly Detection?
●Definition: The process of identifying data points, events or observations that deviate significantly from expected
patterns or norms
●Purpose: To detect unusual, suspicious or rare occurrences that may indicate errors, threats, opportunities or insights
●Types of Anomalies:
○Point anomalies (single outlier data points)
○Contextual anomalies (anomalies based on context, e.g. time of year)
○Collective anomalies (anomalous sequences or sets of data)
●Techniques:
○Statistical methods (e.g. distribution tests)
○Machine learning (supervised, unsupervised, semi-supervised)
○Visualization and human analysis

Update confidential designator here
Version number here V00000
Applications (Use Cases)
5
Source:
Insert source data here
Insert source data here
Applications (Use Cases):

●Cybersecurity (network intrusions, fraud detection)
●Industrial (equipment failure, sensor errors)
●Business analytics (sales outliers, customer behavior)
●Scientific research (experimental outliers)

Update confidential designator here
Version number here V00000
What is Edge (IoT)?
6
Source:
Insert source data here
Insert source data here
What is Edge (IoT)?

●Definition: Edge computing brings processing power and data storage closer to the sources of data generation (e.g.
IoT devices, sensors)
●Key Concept: Processing data at or near the "edge" of the network, rather than sending all data to the cloud or a
central data center
●Why It's Important:
○Reduced latency for time-sensitive applications
○Reduced bandwidth usage by processing data locally
○Increased security and data privacy
○Resilience against network disruptions
●Edge Devices:
○IoT sensors, cameras, industrial equipment
○Gateways to aggregate and process data
○Edge servers/appliances for analytics and control

Update confidential designator here
Version number here V00000
What is Edge (IoT)?
7
Source:
Insert source data here
Insert source data here
Edge (IoT) (Use Cases):

●Smart manufacturing (predictive maintenance)
●Autonomous vehicles (real-time decision making)
●Smart cities (traffic optimization, public safety)
●Remote monitoring (oil rigs, renewable energy)

Update confidential designator here
Version number here V00000

What is ArgoCD?

8
Source:
Insert source data here
Insert source data here

What is ArgoCD?


●ArgoCD is an open-source, declarative, continuous delivery tool for Kubernetes
●It follows the GitOps pattern of using Git repositories as the source of truth for defining the desired application state
●It automates the deployment of applications to Kubernetes clusters by syncing the live state with the desired target
state specified in Git
●It is implemented as a Kubernetes controller that continuously monitors applications
●It enables GitOps workflows by treating Git as the single source of truth
●It supports declarative application definitions using Kubernetes manifests, Helm charts, Kustomize, etc.

Update confidential designator here
Version number here V00000
Deployment using ArgoCD for edge devices

9
Source:
Insert source data here
Insert source data here


Deployment using ArgoCD for edge devices




●Lists installed operators like amq-streams, camel-k,
cluster-config-app
●Displays data foundation and CI/CD pipeline operators
●Indicates sync status and health of each operator/application
●Allows syncing, refreshing, and deleting operators from the
dashboard

Update confidential designator here
Version number here V00000











What is Apache Kafka?








10
Source:
Insert source data here
Insert source data here


What is Apache Kafka?



●Distributed streaming platform for handling real-time data
feeds
●Open-source system developed by the Apache Software
Foundation
●Written in Java and Scala
●Provides three main capabilities:
○Publish and subscribe to streams of records
○Store streams of records in order they were generated
○Process streams of records in real-time
●Based on a partitioned log model
○Data is stored in ordered, immutable logs called topics
○Topics are partitioned and replicated across brokers for
scalability

Update confidential designator here
Version number here V00000












Viewing Kafka messages in the data lake










11
Source:
Insert source data here
Insert source data here


Viewing Kafka messages in the data lake



●Kafdrop allows us to view the data that are coming into kafka
●You can see different data for the topic Olympic
●Data is separated by the Offset number

Update confidential designator here
Version number here V00000











What is S3?









12
Source:
Insert source data here
Insert source data here


What is S3?



●S3 stands for Amazon Simple Storage Service, which is a
highly scalable object storage service provided by Amazon
Web Services (AWS).
○Object storage service for storing and retrieving any
amount of data from anywhere on the internet
○Provides 99.999999999% durability and 99.99%
availability of objects over a given year
○Stores data as objects with unique key identifiers in
buckets
○Supports various use cases like data lakes, backup and
restore, content delivery, big data analytics, etc.

Update confidential designator here
Version number here V00000












What is Prometheus?










13
Source:
Insert source data here
Insert source data here





What is Prometheus?




Prometheus is an open-source monitoring and alerting system widely
used for collecting and querying metrics from various sources.
●Open-source monitoring and alerting toolkit
●Designed for monitoring cloud-native and microservices
applications
●Collects time-series data as metrics from targets (servers,
databases, applications, etc.)
●Stores metrics data in a time-series database with
configurable retention
●Allows setting up alerting rules based on metric expressions
●Provides a web UI for visualizing metrics and managing alerts

Update confidential designator here
Version number here V00000












Monitoring application metrics with Prometheus










14
Source:
Insert source data here
Insert source data here






Monitoring application metrics with Prometheus




●You can view the metric data in Prometheus
●The example is showing three edge devices
engine_fuel_consumption
●End-Users would be able to view the live metrics and data
using this tool

Update confidential designator here
Version number here V00000












What is Camel K?










15
Source:
Insert source data here
Insert source data here







What is Camel K?






●Camel K is a lightweight integration platform for building and
running integration applications on Kubernetes
●It is based on the popular open-source Apache Camel
integration framework
●Provides a streamlined development experience for
Kubernetes-native integration solutions
●Supports common integration patterns like routing,
transformation, orchestration
●Built on top of the Quarkus Kubernetes-native Java
framework
●Part of the Red Hat Integration product portfolio
●Enables integration with external systems like Kafka,
databases, APIs

Update confidential designator here
Version number here V00000












Configuring Camel K Integrations for data pipelines?










16
Source:
Insert source data here
Insert source data here








Configuring Camel K integrations for data pipelines






●Logs indicate the Camel thread and Aggregator route
uploading data
●The pipeline is moving data from Kafka to S3 for the
"Olympic" dataset
●The files will be stored in a .txt file

Update confidential designator here
Version number here V00000












What is a Jupyter Notebook?










17
Source:
Insert source data here
Insert source data here









What is a Jupyter Notebook?







●Jupyter Notebook is an open-source web application that allows you to create and share documents containing live
code, visualizations, and narrative text.
●It provides an interactive computational environment for developing, documenting, and executing code.
●The notebook interface consists of cells that can contain code (in various programming languages like Python, R,
Julia), markdown text, equations, or visualizations.
●Each cell can be executed independently, allowing for an iterative and exploratory workflow.
●The output of code cells (text, graphics, tables, etc.) is displayed inline below the respective cells.
●Notebooks integrate code, rich text elements, and visualizations in a single shareable document.

Update confidential designator here
Version number here V00000












What is a Jupyter Notebook?










18
Source:
Insert source data here
Insert source data here











Jupyter notebooks with code examples









●It is a Jupyter notebook for training an anomaly detection model on train tonnage data
●The notebook covers data exploration, preprocessing, model training, and visualization steps
●It uses an Isolation Forest algorithm for the anomaly detection model
●The notebook generates visualizations like scatter plots, box plots, and heatmaps to analyze the data
●It demonstrates converting the trained model to ONNX format for inference
●The notebook includes code for loading the ONNX model, feature extraction, inference, and visualizing model
outputs

Update confidential designator here
Version number here V00000












Check the Correlations in the Data










19
Source:
Insert source data here
Insert source data here













Check the Correlations in the Data

Update confidential designator here
Version number here V00000












Correlation Heatmap










20
Source:
Insert source data here
Insert source data here













Correlation Heatmap

Update confidential designator here
Version number here V00000












Scatter Plot of Primary Suspension Stiffness vs. Train Acceleration










21
Source:
Insert source data here
Insert source data here
















Scatter Plot of Primary Suspension Stiffness vs. Train Acceleration

Update confidential designator here
Version number here V00000












Links and How to get started










22
Source:
Insert source data here
Insert source data here













Links and How to get started











https://tosin2013.github.io/edge-anomaly-detection/
●Developer Deployment Instructions
●Viewing the Kafka messages under the data lake
project
●Checking the Prometheus charts for the application
●Configuring Camel K Ship integration
●edge-anomaly-detection-notebooks: Jupyter
notebooks with code examples and tutorials related
to this workshop.
●opcua-asyncio-build-pipelines: Build pipelines for
OPC UA applications using asyncio.
●Example App: A sample application demonstrating
engine room monitoring using OPC UA and asyncio.

Update confidential designator here
Version number here V00000
linkedin.com/company/red-hat
youtube.com/user/RedHatVideos
facebook.com/redhatinc
twitter.com/RedHat
23
Red Hat is the world’s leading provider of enterprise
open source software solutions. Award-winning
support, training, and consulting services make Red
Hat a trusted adviser to the Fortune 500.

Thank you