week-02.pdf.Cloud computing.AWS Component

singbling 14 views 36 slides May 01, 2024
Slide 1
Slide 1 of 36
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36

About This Presentation

Aws components


Slide Content

Lecture # 1.2
Life Cycle of a Data Science Project
Dr. Muhammad Nadeem Majeed
[email protected]

•Review of previous lecture
•How to Do Data Science?
•Languages, Tools and Techniques
•Life Cycle of a Data Science Project
•Industry Job Roles in Data Science
2
Today’s Agenda
Instructor: Engr. Muhammad Nadeem Majeed, Ph.D.

3
Instructor: Engr. Muhammad Nadeem Majeed, Ph.D.
Recap of Previous Lecture

4
Structured Data
Instructor: Engr. Muhammad Nadeem Majeed, Ph.D.
Pre-defined
Data Model
Text-based
Easy to Search
Database
Structured
Data
Characteristics
Resides in
Social Security
Number
Credit Card
number
Transaction
information
Customer
Name
Phone Numbers
Date
Examples
Inventory control
Airline reservation
systems
Applications
Data Mart

5
Semi-structured Data
Instructor: Engr. Muhammad Nadeem Majeed, Ph.D.
Semi-structured
Data
Loosely
organized
Characteristics
Files with tagged-
text format
Resides in
Server logs
Tweets
organized by
hashtags
Email sorting
by folders
inbox
Sent
draft
Examples
Sensor outputServer logs
Applications

6
Unstructured Data
Instructor: Engr. Muhammad Nadeem Majeed, Ph.D.
Unstructured
Data
Documents
Images
Audio,
Video
Characteristics
Data Lake
Resides in
Surveillance
imageryReports
Audio files
Video files
Email
messages
Examples
Presentation
Software
Viewing and
editing tools
Email
clients
Applications
MS Azure

7
What is Data Science?
Instructor: Engr. Muhammad Nadeem Majeed, Ph.D.
SystemsAlgorithms
Knowledge & Insights
ProcessesScientific Methods
Data Science is an Inter-Disciplinary Field that uses

8
Instructor: Engr. Muhammad Nadeem Majeed, Ph.D.
Applications of Data Science
Applications
of Data
Science
01
Social Media
Recommendation
Systems
Email Filtering
Ad
Placement
Sentiment
Analysis
02
Banking
Anti-Money
Laundering
Credit
Scoring
Fraud
Detection
Price
Optimization
03
E-Commerce
Recommendation
Systems
Upselling
Cross-selling
Discount Price
Optimization
Business
Forecasting
04
Search Engines
Search AlgorithmAd Placement
Personalized
Search Results

9
Instructor: Engr. Muhammad Nadeem Majeed, Ph.D.
05
Travel
Predict Flight DelayDynamic Pricing
06
Healthcare
Disease
Prediction
Medical
Imaging
Applications
of Data
Science
07
Automation
RobotsSelf-driving
cars
08
Credit &
insurance
Fraud & risk
detection
Claims
prediction
Applications of Data Science (cont…)
Seeing AI
Best Route Selection

10
Instructor: Engr. Muhammad Nadeem Majeed, Ph.D.
How to Do Data Science?
Languages, Tools and Technologies

11
Instructor: Engr. Muhammad Nadeem Majeed, Ph.D.
Who is a Data Scientist?
Mathematics
Business
Technology
Data Scientist1
Skill Set2
Programming languages3
Techniques
Tools
5
4
Adatascientistisaprofessional
responsibleforcollecting,analyzing
andinterpretingextremelylarge
amountsofstructuredand
unstructureddatainordertogain
usefulinsightstogrowthebusiness

12
Instructor: Engr. Muhammad Nadeem Majeed, Ph.D.
Skill Sets of a Data Scientist
Data Scientist
Programming languages3
Techniques
Tools
5
4
1
Skill Set2
Statistics
Programming
Languages
Data extraction
& processing
Data wrangling
& exploration
Machine LearningBig Data processing
framework
Data
Visualization

13
Instructor: Engr. Muhammad Nadeem Majeed, Ph.D.
Programming Languages for Data Science
Data Scientist
Techniques
Tools
5
4
1
Skill Set2
Programming language3
Python
R
Julia

14
Instructor: Engr. Muhammad Nadeem Majeed, Ph.D.
Tools for Handling this Big Data (3Vs)
ToolsaresoftwaresthatareusedtoapplyDS
techniquestoperformatask.
Data Scientist
Techniques5
1
Skill Set2
Programming language3
Tools4
VOLUME
VARIETY
VELOCITY
PythonLibrariesforDataScienceTasks

15
Instructor: Engr. Muhammad Nadeem Majeed, Ph.D.
Techniques for Data Science
Techniquesaresetofproceduresthatarefollowedto
performatask.Toolsandtechniquestogetherhelpsindata
collection,datastorage,datapreparation,dataanalysis,
datamodelinganddatavisualization
Data Scientist1
Skill Set2
Programming language3
Tools4
Techniques5
Descriptive Statistics
Inferential Statistics
Decision Tree
ClassificationTechniques
Random ForestLogistic Regression

16
Instructor: Engr. Muhammad Nadeem Majeed, Ph.D.
Techniques for Data Science
Techniquesaresetofproceduresthatarefollowedto
performatask.Toolsandtechniquestogetherhelpsindata
collection,datastorage,datapreparation,dataanalysis,
datamodelinganddatavisualization
Data Scientist1
Skill Set2
Programming language3
Tools4
Techniques5
Descriptive Statistics
Inferential Statistics
RegressionTechniques

17
Instructor: Engr. Muhammad Nadeem Majeed, Ph.D.
Techniques for Data Science
Techniquesaresetofproceduresthatarefollowedto
performatask.Toolsandtechniquestogetherhelpsindata
collection,datastorage,datapreparation,dataanalysis,
datamodelinganddatavisualization
Data Scientist1
Skill Set2
Programming language3
Tools4
Techniques5
Descriptive Statistics
Inferential Statistics
K-Means Clustering Hierarchical Clustering
ClusteringTechniques
DB SCAN

18
Instructor: Engr. Muhammad Nadeem Majeed, Ph.D.
Why is Data Science so Complicated?

19
Instructor: Engr. Muhammad Nadeem Majeed, Ph.D.
Data Science Life Cycle

20
Overview of Data Science Life Cycle
Instructor: Engr. Muhammad Nadeem Majeed, Ph.D.
Business Problem
Data Acquisition
Data
Processing
EDA &
Visualization
ML Model
Creation-Trg-Eval
Deployment &
Monitoring
Feature Engineering

21
Overview of Data Science Life Cycle
Instructor: Engr. Muhammad Nadeem Majeed, Ph.D.
Business Problem
Data Acquisition
Data
Processing
EDA &
Visualization
ML Model
Creation-Trg-Eval
Deployment &
Monitoring

22
Understanding Business Problem
Instructor: Engr. Muhammad Nadeem Majeed, Ph.D.
Business Problem
Data Acquisition
Data Processing
EDA & Visualization
Model Creation-Trg-Eval
Deployment & Monitoring
2
3
4
5
1
6
MostcriticalphaseofaData
ScienceLifeCycle,ifconducted
willsaveslotoftime,moneyand
resources.
Understandtheproblemby
talkingtothestakeholders&
domainexpertstogettheclear
understandingoftheproblemand
documentalltherequirements.
Identifythekeybusinessvariablesthat
needtobepredicted
Definethesuccesscriteriaandsuccess
measuringmetrics(KPIs&SLAs)

23
Data Acquisition
Instructor: Engr. Muhammad Nadeem Majeed, Ph.D.
Business Problem
Data Processing
EDA & Visualization
Model Creation-Trg-Eval
Deployment & Monitoring
3
5
1
6
Data Acquisition2
4
Whatdatadoweneedforour
project?
Whatarethedatasources
anddataformat?
Whereisthedatalocated?
Howcanweobtainthe
data?
Whatisthemostefficientway
tostoreandaccessallofitfor
laterprocessing?

24
Data Processing
Instructor: Engr. Muhammad Nadeem Majeed, Ph.D.
Business Problem
EDA & Visualization
Model Creation-Trg-Eval
Deployment & Monitoring
5
1
6
Data Processing3
Data Acquisition2
4
Extract:Acquiredata
fromsingleor
multiplesources
Transform
Data Wrangling/Munging:
Transform collected data into
desired format for later
analysis
Data Cleansing:Handling
missing data, duplicate values,
null values, mis -spelled
attributes, inconsistent data
types and outliers
Load:Thetransformeddatais
loadedintothetargetdata
sourceordatawarehouse

25
Exploratory Data Analysis & Visualization
Instructor: Engr. Muhammad Nadeem Majeed, Ph.D.
Business Problem
Model Creation-Trg-Eval
Deployment & Monitoring
5
1
6
Data Processing3
Data Acquisition2
EDA & Visualization4
EDAinvolvesunderstandingyourdata
andidentifyingpatterns.Itinvolves
identifyingrelationshipsand
correlationsbetweenvariablesusing
visualaswellasstatisticaltechniques
Thesepatternsarenotevidentwhenyou
arelookingatdataintables.Acorrect
visualizationtoolcanhelpyouquickly
gainadeeperunderstandingofyourdata
DataAnalyst’sJobEndsHere
FinallyEDAinvolvesFeatureEngineering,whichperformsfeaturecreation,
transformation,extractionandselectionbeforecreationofMLmodel

26
Instructor: Engr. Muhammad Nadeem Majeed, Ph.D.
ML Model: Creation-Training- Evaluation
Business Problem
Deployment & Monitoring
1
6
Data Processing3
Data Acquisition2
EDA & Visualization4
ML Model
Creation-Trg-Eval
5
MLisanapplicationofAIthatgivescomputerstheabilityto
learnwithoutbeingexplicitlyprogrammed. [ArthurSamuel]
Data
Program
Output
Data
Model
Output
Traditional Programming Machine Learning
Training
Data
Learning
Algorithm
Training
Phase 1
Model
Test DataAccuracy
Phase 2
Testing
Usedifferentbutappropriate
machinelearningalgorithmslike
DecisionTree,LinearRegression,
K-NearestNeighbourtothedata
toidentifythemodelthatbestfits
thebusinessrequirements

27
Model Deployment and Monitoring
Instructor: Engr. Muhammad Nadeem Majeed, Ph.D.
Business Problem1
Data Processing3
Data Acquisition2
EDA & Visualization4
Model Creation-Trg-Eval5
Deployment &
Monitoring
6
Afteramodelistrained,tuned
andtested,youcandeploythe
modelintoproductionandmake
inferences(predictions)
Checkthedeploymentenvironment
fordependencyissues
Deploythemodelfirstinthetestand
thenintheproductionenvironment
Cloud
Deployment
Mostofthetimestheliverealworlddata
differfromthedatathatwasusedtotrainthe
model,thusmakingthemodellessaccurate.
Tohandlethis,buildamodelmonitorthat
detectsdeviationssuchasdatadriftand
alertsyoutotakeremedialactions

28
Instructor: Engr. Muhammad Nadeem Majeed, Ph.D.
Industry Job Roles in
Data Science

29
Instructor: Engr. Muhammad Nadeem Majeed, Ph.D.
Industry Job Roles: Data Scientist
•Senior most in the team and take inputs from the
rest to formulate actionable insight for the
business
•Makes use of the latest tools and technologies in
finding solutions and reaching conclusions that
are crucial for an organization’s growth and
development
Data Scientist1
Data Engineer2
Data Analyst3
ML Engineer
Database Administrator
5
4

30
Instructor: Engr. Muhammad Nadeem Majeed, Ph.D.
Industry Job Roles: Data Engineer/Architect
•Scrape data and store it in warehouses using ETL
•Handle databases and create data warehouses
•Design, build, and manage the big data
infrastructure
•Build data pipelines for easy access of data
•Big Data Tools (Apache Spark, Apache Hive,
Hadoop)
•Cloud Platforms (AWS, Google Cloud Platform)
Data Scientist
Data Analyst3
ML Engineer
Database Administrator
5
4
Data Engineer
1
2

31
Instructor: Engr. Muhammad Nadeem Majeed, Ph.D.
Industry Job Roles: Data Analyst
•Data Analyst is an entry level member into the
data analytics team
•Needs to have good technical skills and know the
basics of statistics, data munging, data utilization,
and exploratory data analysis
•Generate reports after analyzing the data
•Can move to the role of Data engineer and Data
scientist with more experience
Data Scientist
ML Engineer
Database Administrator
5
4
Data Engineer
1
2
Data Analyst3

32
Instructor: Engr. Muhammad Nadeem Majeed, Ph.D.
Industry Job Roles: Database Administrator
•Responsible for administering the collected data
by installing, configuring, monitoring, operating,
and maintaining database
•Ensure that all databases are available to all
relevant users, and is protected securely from any
malicious activity
Data Scientist
ML Engineer5
Data Engineer
1
2
Data Analyst3
Database
Administrator
4

33
Instructor: Engr. Muhammad Nadeem Majeed, Ph.D.
Industry Job Roles: Machine Learning Engineer
•Machine learning engineer works as a part of
large data science team
•Responsible to design and create all algorithms
capable of learning and making predictions
•They are expected to perform A/B testing, build
data pipelines, and implement algorithms for
classification, clustering, regression, anomaly
detection etc.
Data Scientist
Data Engineer
1
2
Data Analyst3
Database Administrator4
ML Engineer5

34
History: Data Science Salary Trends
Instructor: Engr. Muhammad Nadeem Majeed, Ph.D.
Source: https://towardsdatascience.com/why-learn-data-science-in -2020-d3f54123b2e4

35
History: Job trends
Instructor: Engr. Muhammad Nadeem Majeed, Ph.D.
Source: https://www.tecla.io/blog/the-high-demand-for-data-scientists- and-how-to-hire-for-them/

36
Things To Do
Instructor: Engr. Muhammad Nadeem Majeed, Ph.D.
Coming to office hours does NOT mean you are academically weak!
•Visitallthehyperlinkedtoolsandtechnologiesin
todayslectureslides.Youshouldbeabletogivea
singlelinedescriptionofeach.
•HaveaveryclearunderstandingofDataScience
LifeCycle,thetools&thetechnologiesusedin
eachphase.
•ThinkoffewusecaseswhereyoucanapplyData
Science,MachineLearningandDeepLearning
technologiesandmakealistoftheskillsetyou
needtodevelop/learntoimplementanddeploy
suchprojects.
Tags