Data Scientist Introduction bref overview of Concepts
rahulgulab12
10 views
25 slides
Jun 12, 2024
Slide 1 of 25
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
About This Presentation
Data Science Introduction
Size: 776.74 KB
Language: en
Added: Jun 12, 2024
Slides: 25 pages
Slide Content
Data Scientist April 2018
Agenda Data Science and its Application Stages-Data science, Project roles C lassification , decision tree , random forest Demo using R Technology
Data Science Introduction-Concepts D ata science is managing the process that can transform hypotheses and data into actionable predictions .
Data Science Introduction-Applications Amazon’s product recommendation systems LinkedIn’s contact recommendation system Retail Business – Buying patterns , segment Twitter’s trending topics Google’s advertisement valuation systems Walmart’s consumer demand projection systems
Data Science Domains
Data Science Introduction-Project Roles
Data Science Introduction- Processes in Data Science Project
Data Science – Modelling Methods
Data Science – Modelling Method Classification and Regression Trees Example :- Finding bad l oan applications Input variables :- Loan amount, duration, a ge, salary , any other loan , address, I ncome , education , background data , location etc 1000 applications exist out of which 300 have been defaulted Decision Tree for identifying Potential defaulters
Classification and Regression Trees Duration>50 Amount>4 million Amount> 1mil Amount<5 mil Bad (0.68) Duration>120 Good (0.75) Good (0.56) Bad (0.25) Good (0.61) Bad (0.88)
Data Science – Modelling Method K – nearest Neighbors( Knn ) Example : Male , Female distribution Hair Length ( c ms ) 60 40 20 0/ 140 150 160 170 180 190 200 Height ( cms )
Data Science – Modelling Method K – nearest Neighbors( Knn ) Example : Male , Female distribution Hair Length ( c ms ) 60 40 20 0/ 140 150 160 170 180 190 200 Height ( cms )
Data Science – Modelling Method Random Forest (RF) Tree 1 Tree 3 Tree 2
Data Science – Modelling Method Random Forest (RF) Input All Trees Prediction Tree1: Tree2: Tree3: Random Forest Predicts:
Data Science – Modelling Method Random Forest (RF) A pplication where random forest algorithm is widely used: Banking - loyal customer and fraud customers Medicine-Disease (patient’s medical records) Stock Market- S tock behavior, loss , Profit E-commerce- Similar customer , segmentation
Data Science – Modelling Method Support Vector Example : Male , Female distribution Hair Length ( c ms ) 60 40 20 0/ 140 150 160 170 180 190 200 Height ( cms )
Data Science – Modelling Method Support Vector Example : Male , Female distribution Hair Length ( c ms ) 60 40 20 0/ 140 150 160 170 180 190 200 Height ( cms )
Data Science –Model Evaluation Process Training , Test and Validation DATA Test/ Train Split Training DATA Test DATA Training Process Model Predictions
Data Science Demo Example
Demo Explanation Data 3 Species
Demo Explanation Load the package Caret and load the data Split the data into 2 parts -80 % would be kept in dataset and 20 % into validation Feed the dataset to 4 algorithms(CART,KNN,SV,RF) Select the best algorithm Feed the validation to best algorithm Check the output
Data Science Demo Installing the R platform. Loading the dataset. Summarizing the dataset. Visualizing the dataset. Evaluating some algorithms. Making some predictions
Other Practical's of Data Science https :// towardsdatascience.com/examples-of-data-science-with-r-789c6996435 Customer analysis and predictive analysis Association rules –( medical diagnosis, bio-medical, census data, fraud detection, CRM ) Hr Analytics - Finding valuable employees and retaining it
Data Science Resources Practical Data Science with R Demo commands R and R Studio installation files Resources kept at below locatio n \\gb-pb-dbm-v01\Data_Science_Resources