AIRLINE_SATISFACTION_Data Science Solution on Azure

SanelaNikodinoska1 82 views 34 slides Jul 04, 2024
Slide 1
Slide 1 of 34
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34

About This Presentation

Airline Satisfaction Project using Azure

This presentation is created as a foundation of understanding and comparing data science/machine learning solutions made in Python notebooks locally and on Azure cloud, as a part of Course DP-100 - Designing and Implementing a Data Science Solution on Azure.


Slide Content

July 2024 Sanela Nikodinoska AIRLINE SATISFACTION DATA SCIENCE SOLUTION ON

Agenda Introduction Automated ML Designer Notebooks – Python SDK Closing

Introduction For the last but most significant course DP – 100 – Designing and Implementing a Data Science Solution on Azure, part of Data Science Institute held by Semos Education, Airline Satisfaction dataset was given to design and implement a data science solution on Azure. This presentation is an overview of the implemented solutions created using Azure Machine Learning Studio. Since the subscription to Azure was made for learning purposes only and is now cancelled, this presentation is made upon screenshots of the most important steps while developing, training and deploying ml models.

Automated ML Screenshots from Azure Let’s dive in

First steps Started with an Azure free trial, created resource group from UI and created Azure Machine Learning Service

Data – created data asset, uploading local dataset Airline Satisfaction to Azure (no screenshot for that) Automated ML – created two experiments, setting different primary metrics and featurization parameters

Automated ML – best model in both experiments was MaxMinScaler , LightGBM , experiments stopped due to early stopping policy based on level of primary metric

Automated ML – metrics

Designer Screenshots from Azure Let’s dive in

Ju Authoring Using components from Authoring - > Designer tab, CREATED two pipelines with two estimators and one pipeline with single estimator model for feature importance component: Two-Class Logistic Regression and Two-Class Decision Forest, Two-Class Support Vector Machine and Two-Class Neural Network, (deep-learning model), Two-Class Boosted Decision Tree with Cross – Validate Model component Jobs / Metrics After configuring and submitting pipelines and images for env, a job was created. The overview of job, as well as its outputs, logs, child jobs and metrics are presented in the following snapshots Registered Models The best performing models from Automated ML and Designer (Neural Network and LightGBM ) registered as custom and mflow models Real-time Endpoint Blue/green deployment of two best models was made and the blue deployment was tested for inference / the endpoint was invoked DESIGNER Compute targets Compute instance for profiling data asset, compute clusters for training models and pipeline sweep jobs were created Environments Compute instance for profiling data asset, compute clusters for training models and pipeline sweep jobs were created

DESIGNER Two-Class Logistic Regression and Two-Class Decision Forest pipeline

DESIGNER Two-Class Support Vector Machine and Two-Class Neural Network

DESIGNER Two-Class Decision Tree with Feature Importance component and Cross – Validate Model component

Designer - Feature importance

JOBS – list of all experiments

JOBS – overview of designer pipelines metrics – Random Forest Classifier best model Note: No snapshots of pipelines success

JOBS – overview of designer pipelines metrics –Two-Class Neural Network model Note: No snapshots of pipelines success

JOBS – overview of designer pipelines metrics –Two-Class Logistic Regression least performing model Note: No snapshots of pipelines success

Registered models

Deployment of models

Real-time Endpoint

Custom environment for model deployed to the endpoint

Predicting / Invoking the Endpoint

Environments Custom and curated environments for deploying and testing

Compute targets

Notebooks Screenshots from Azure Let’s dive in

Notebooks – created pipeline for training and scoring RandomForestClassifer model and tunning hyperparameters with sweep job

Notebooks – running tunning hyperparameters with sweep job

Notebooks – results from pipeline – child jobs

Notebooks – results from pipeline – best model

Notebooks – results from pipeline – model metrics

Notebooks – results from pipeline - predicting

Summary No-code or programmatically What suits the most Great business solution Having all resources in one place New subscription For future projects Getting work done F inished my course project So many services Yet to discover: Azure DataBricks , Azure Synapce Analystics (for data ingestion), Azure AI Services, Azure Data factory etc. Recommend Definitely!

Closing Thanks to your time. Hoping to get some of your feedback for improving. Sanela Nikodinoska [email protected]