Monitoring and Managing Anomaly Detection on OpenShift.pdf
TosinAkinosho
212 views
23 slides
Jun 13, 2024
Slide 1 of 23
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
About This Presentation
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and ...
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.
Size: 1.81 MB
Language: en
Added: Jun 13, 2024
Slides: 23 pages
Slide Content
Update confidential designator here
Version number here V00000
1
Monitoring and
Managing Anomaly
Detection on
OpenShift
Tosin Akinosho
Monitoring and Managing Anomaly Detection on OpenShift
Update confidential designator here
Version number here V00000
Project Overview and Purpose
2
▸Provide hands on tutorial and code for implementing anomaly detection
on edge devices
▸Enable real time identification of unusual behavior or failures on
resource-constrained IoT/edge devices
▸Cover end-to-end process from data collection and model training to
edge deployment and monitoring
Project Overview and Purpose
Update confidential designator here
Version number here V00000
Optional section marker
3
▸What is Anomaly Detection?
▸What is Edge (IoT)?
▸What is ArgoCD?
▸Deployment using ArgoCD for edge devices
▸What is Apache Kafka and S3?
▸Viewing Kafka messages in the data lake
▸What is Prometheus?
▸Monitoring application metrics with Prometheus
▸What is Camel K
▸Configuring Camel K integrations for data pipelines
▸What is a Jupyter notebook
▸Jupyter notebooks with code examples
Key Topics Covered
Update confidential designator here
Version number here V00000
What is Anomaly Detection?
4
Source:
Insert source data here
Insert source data here
What is Anomaly Detection?
●Definition: The process of identifying data points, events or observations that deviate significantly from expected
patterns or norms
●Purpose: To detect unusual, suspicious or rare occurrences that may indicate errors, threats, opportunities or insights
●Types of Anomalies:
○Point anomalies (single outlier data points)
○Contextual anomalies (anomalies based on context, e.g. time of year)
○Collective anomalies (anomalous sequences or sets of data)
●Techniques:
○Statistical methods (e.g. distribution tests)
○Machine learning (supervised, unsupervised, semi-supervised)
○Visualization and human analysis
Update confidential designator here
Version number here V00000
Applications (Use Cases)
5
Source:
Insert source data here
Insert source data here
Applications (Use Cases):
Update confidential designator here
Version number here V00000
What is Edge (IoT)?
6
Source:
Insert source data here
Insert source data here
What is Edge (IoT)?
●Definition: Edge computing brings processing power and data storage closer to the sources of data generation (e.g.
IoT devices, sensors)
●Key Concept: Processing data at or near the "edge" of the network, rather than sending all data to the cloud or a
central data center
●Why It's Important:
○Reduced latency for time-sensitive applications
○Reduced bandwidth usage by processing data locally
○Increased security and data privacy
○Resilience against network disruptions
●Edge Devices:
○IoT sensors, cameras, industrial equipment
○Gateways to aggregate and process data
○Edge servers/appliances for analytics and control
Update confidential designator here
Version number here V00000
What is Edge (IoT)?
7
Source:
Insert source data here
Insert source data here
Edge (IoT) (Use Cases):
Update confidential designator here
Version number here V00000
What is ArgoCD?
8
Source:
Insert source data here
Insert source data here
What is ArgoCD?
●ArgoCD is an open-source, declarative, continuous delivery tool for Kubernetes
●It follows the GitOps pattern of using Git repositories as the source of truth for defining the desired application state
●It automates the deployment of applications to Kubernetes clusters by syncing the live state with the desired target
state specified in Git
●It is implemented as a Kubernetes controller that continuously monitors applications
●It enables GitOps workflows by treating Git as the single source of truth
●It supports declarative application definitions using Kubernetes manifests, Helm charts, Kustomize, etc.
Update confidential designator here
Version number here V00000
Deployment using ArgoCD for edge devices
9
Source:
Insert source data here
Insert source data here
Deployment using ArgoCD for edge devices
●Lists installed operators like amq-streams, camel-k,
cluster-config-app
●Displays data foundation and CI/CD pipeline operators
●Indicates sync status and health of each operator/application
●Allows syncing, refreshing, and deleting operators from the
dashboard
Update confidential designator here
Version number here V00000
What is Apache Kafka?
10
Source:
Insert source data here
Insert source data here
What is Apache Kafka?
●Distributed streaming platform for handling real-time data
feeds
●Open-source system developed by the Apache Software
Foundation
●Written in Java and Scala
●Provides three main capabilities:
○Publish and subscribe to streams of records
○Store streams of records in order they were generated
○Process streams of records in real-time
●Based on a partitioned log model
○Data is stored in ordered, immutable logs called topics
○Topics are partitioned and replicated across brokers for
scalability
Update confidential designator here
Version number here V00000
Viewing Kafka messages in the data lake
11
Source:
Insert source data here
Insert source data here
Viewing Kafka messages in the data lake
●Kafdrop allows us to view the data that are coming into kafka
●You can see different data for the topic Olympic
●Data is separated by the Offset number
Update confidential designator here
Version number here V00000
What is S3?
12
Source:
Insert source data here
Insert source data here
What is S3?
●S3 stands for Amazon Simple Storage Service, which is a
highly scalable object storage service provided by Amazon
Web Services (AWS).
○Object storage service for storing and retrieving any
amount of data from anywhere on the internet
○Provides 99.999999999% durability and 99.99%
availability of objects over a given year
○Stores data as objects with unique key identifiers in
buckets
○Supports various use cases like data lakes, backup and
restore, content delivery, big data analytics, etc.
Update confidential designator here
Version number here V00000
What is Prometheus?
13
Source:
Insert source data here
Insert source data here
What is Prometheus?
Prometheus is an open-source monitoring and alerting system widely
used for collecting and querying metrics from various sources.
●Open-source monitoring and alerting toolkit
●Designed for monitoring cloud-native and microservices
applications
●Collects time-series data as metrics from targets (servers,
databases, applications, etc.)
●Stores metrics data in a time-series database with
configurable retention
●Allows setting up alerting rules based on metric expressions
●Provides a web UI for visualizing metrics and managing alerts
Update confidential designator here
Version number here V00000
Monitoring application metrics with Prometheus
14
Source:
Insert source data here
Insert source data here
Monitoring application metrics with Prometheus
●You can view the metric data in Prometheus
●The example is showing three edge devices
engine_fuel_consumption
●End-Users would be able to view the live metrics and data
using this tool
Update confidential designator here
Version number here V00000
What is Camel K?
15
Source:
Insert source data here
Insert source data here
What is Camel K?
●Camel K is a lightweight integration platform for building and
running integration applications on Kubernetes
●It is based on the popular open-source Apache Camel
integration framework
●Provides a streamlined development experience for
Kubernetes-native integration solutions
●Supports common integration patterns like routing,
transformation, orchestration
●Built on top of the Quarkus Kubernetes-native Java
framework
●Part of the Red Hat Integration product portfolio
●Enables integration with external systems like Kafka,
databases, APIs
Update confidential designator here
Version number here V00000
Configuring Camel K Integrations for data pipelines?
16
Source:
Insert source data here
Insert source data here
Configuring Camel K integrations for data pipelines
●Logs indicate the Camel thread and Aggregator route
uploading data
●The pipeline is moving data from Kafka to S3 for the
"Olympic" dataset
●The files will be stored in a .txt file
Update confidential designator here
Version number here V00000
What is a Jupyter Notebook?
17
Source:
Insert source data here
Insert source data here
What is a Jupyter Notebook?
●Jupyter Notebook is an open-source web application that allows you to create and share documents containing live
code, visualizations, and narrative text.
●It provides an interactive computational environment for developing, documenting, and executing code.
●The notebook interface consists of cells that can contain code (in various programming languages like Python, R,
Julia), markdown text, equations, or visualizations.
●Each cell can be executed independently, allowing for an iterative and exploratory workflow.
●The output of code cells (text, graphics, tables, etc.) is displayed inline below the respective cells.
●Notebooks integrate code, rich text elements, and visualizations in a single shareable document.
Update confidential designator here
Version number here V00000
What is a Jupyter Notebook?
18
Source:
Insert source data here
Insert source data here
Jupyter notebooks with code examples
●It is a Jupyter notebook for training an anomaly detection model on train tonnage data
●The notebook covers data exploration, preprocessing, model training, and visualization steps
●It uses an Isolation Forest algorithm for the anomaly detection model
●The notebook generates visualizations like scatter plots, box plots, and heatmaps to analyze the data
●It demonstrates converting the trained model to ONNX format for inference
●The notebook includes code for loading the ONNX model, feature extraction, inference, and visualizing model
outputs
Update confidential designator here
Version number here V00000
Check the Correlations in the Data
19
Source:
Insert source data here
Insert source data here
Check the Correlations in the Data
Update confidential designator here
Version number here V00000
Correlation Heatmap
20
Source:
Insert source data here
Insert source data here
Correlation Heatmap
Update confidential designator here
Version number here V00000
Scatter Plot of Primary Suspension Stiffness vs. Train Acceleration
21
Source:
Insert source data here
Insert source data here
Scatter Plot of Primary Suspension Stiffness vs. Train Acceleration
Update confidential designator here
Version number here V00000
Links and How to get started
22
Source:
Insert source data here
Insert source data here
Links and How to get started
https://tosin2013.github.io/edge-anomaly-detection/
●Developer Deployment Instructions
●Viewing the Kafka messages under the data lake
project
●Checking the Prometheus charts for the application
●Configuring Camel K Ship integration
●edge-anomaly-detection-notebooks: Jupyter
notebooks with code examples and tutorials related
to this workshop.
●opcua-asyncio-build-pipelines: Build pipelines for
OPC UA applications using asyncio.
●Example App: A sample application demonstrating
engine room monitoring using OPC UA and asyncio.
Update confidential designator here
Version number here V00000
linkedin.com/company/red-hat
youtube.com/user/RedHatVideos
facebook.com/redhatinc
twitter.com/RedHat
23
Red Hat is the world’s leading provider of enterprise
open source software solutions. Award-winning
support, training, and consulting services make Red
Hat a trusted adviser to the Fortune 500.