Data scientist roadmap

SonuKumar893 1,389 views 30 slides Dec 11, 2019
Slide 1
Slide 1 of 30
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30

About This Presentation

Data Scientist ROADMAP to kick start your career


Slide Content

How to Become a Data Scientist from Scratch by SONU KUMAR

What is Data Science?? Data Science as a multi-disciplinary subject encompasses the use of mathematics, statistics, and computer science to study and evaluate data. The key objective of Data Science is to extract valuable information for use in strategic decision making, product development, trend analysis and forecasting . A Data scientist is sort of ' jack-of-all-trades ' for data crunching. Basically, 3 main skills a data scientist needs to possess are mathematics/statistics, computer programming literacy and knowledge of particular business.

Data Science is a Broader Field

Comparison between Different Roles in 2018

How to become a Data Scientist??

Mathematics Linear Algebra: Matrix, Eigen, Tensor etc. Calculus: Differentiation and Integration. Probability: Bayes Theorem, Optimization etc. Statistics: Inferential Statistics, Descriptive Statistics, Chi squared Testes, Random Variable, Gaussian And Normal Distribution. [Best Resources:- Khan Academy and Machine Learning Mystery Mathematics Course ]

Programming Languages P ython: It is the Bible. Easy to understand, i.e., plane English No semicolon Simple and tons of libraries available Talk about Packages Data visualization using ggplot2, tidy are extremely important [Best Resource :- Sentdex YouTube channel]

Libraries

Data Wrangling and Management Data Mining Data Cleaning Data Management Relevant Skills: MySQL: RDBMS NoSQL: Mongo DB, Cassandra etc. JOIN

Data Analysis and Visualization Plotting libraries in programming languages, e.g., plotly, matplotlib, seaborn → python g gplot2 → R Tableau is booming now. [Pandas and Numpy for Data Analysis]

Machine Learning and Deep Learning Domain Knowledge??? HEALTHCARE, BUSINESS, FINANCE, SPORTS etc. Supervised Unsupervised Reinforcement

Machine Learning Algorithms Topics: Regression, Decision Tree, Random Forest, Naïve Bayes, Ensemble Learning, AdaBoost, Hierarchical Clustering, Association, k-means Clustering, SVM, KNN, Gradient Descent, Cross Validation, Entropy, Accuracy, Precision, Collaborative Filtration, PCA, Markov model, Boltzmann theorem etc. Testing Evaluation and Validation of Models

Deep Learning Algorithms Neural Networks, Feed Forward NN, Fuzzy Logic, Sequence Model, LSTM, RNN, CNN, CapsNet, Time S eries etc

Big Data Map Reduce Hadoop Apache Spark Hive Pig Mahout Yarn

Additional Skills NLP CV

Course Contents And Projects Introduction Data Mining Introduction of Data Mining Stages of the Data Mining Process Data Mining Goals Information and Knowledge Advantages in Data Mining Related technologies - Machine Learning, DBMS, OLAP, Statistics Data Mining Techniques Role of Data Mining in Various Field like Artificial Intelligence and Internet of Things Future scope of Data Mining

Data Warehouse and OLAP/ Data preprocessing Data cleaning Data transformation Data reduction Data Warehouse and DBMS Multidimensional data model OLAP operations Machine Learning algorithms & concepts Supervised and Unsupervised Technique Regression Analysis Linear Regression and Logistic Regression Classification Prediction

Bayesian Classification Models Association rules Ensemble Learning Neural Networks Perceptron MLP SVM Python/Anaconda Introduction to python and anaconda Conditional Statements Looping, Control Statements Lists, Tuple ,Dictionaries String Manipulation Functions Installing Packages

Introduction of Various Tool Introduction of Anaconda Working on Various Python Library Installing library and packages for machine learning and data science Matplotlib Scipy and Numpy Pandas IPython toolkit scikit -learn Tensorflow, Keras and other deep learning libraries Data Structures in Python Intro to Numpy Arrays Creating ndarrays Indexing

Data Processing using Arrays File Input and Output Sorting & Summarizing Descriptive Statistics Combining and Merging Data Data Analysis Using Pandas Introduction to Pandas Data Type of Pandas Creating DataFrame using Pandas Importing and Exporting Database Working with Complex Data Data Mining using Pandas .

Hand on / Mini Projects on Data Sets Modeling using Regression Creating a Clustering Model Loan Prediction Problem Working on Iris Data Set Titanic Data Boston Housing Data Set Predict Stock Prices Classifying MNIST digits using Logistic Regression Intrusion Detection using Decision CIFAR Data set ImageNet Data Set Credit Risk Analytics using SVM in Python

Learning Outcomes Build artificial neural networks with Tensorflow and Keras Build Deep Learning networks to classify images with Convolutional Neural Networks Implement machine learning, clustering, and search using TF/IDF at massive scale with Apache Spark's MLLib Implement Sentiment Analysis with Recurrent Neural Networks Understand reinforcement learning - and how to build a Pac-Man bot

Make predictions using linear regression, polynomial regression, and multivariate regression Implement Sentiment Analysis with Recurrent Neural Networks Understand reinforcement learning - and how to build a Pac-Man bot Classify medical test results with a wide variety of supervised machine learning classification techniques Cluster data using K-Means clustering and Support Vector Machines (SVM)

Build a spam classifier using Naive Bayes Use decision trees to predict hiring decisions Apply dimensionality reduction with Principal Component Analysis (PCA) to classify flowers Predict classifications using K-Nearest-Neighbor (KNN) Develop using iPython notebooks Understand statistical measures such as standard deviation Visualize data distributions, probability mass functions, and probability density functions Visualize data with matplotlib

Use covariance and correlation metrics Apply conditional probability for finding correlated features Use Bayes' Theorem to identify false positives Understand complex multi-level models Use train/test and K-Fold cross validation to choose the right model Build a movie recommender system using item-based and user-based collaborative filtering Clean your input data to remove outliers Design and evaluate A/B tests using T-Tests and P-Values

Best Resources (Online Videos) Learn Python for Data Science by Microsoft → Edx Statistics and Probability by Khan Academy Introduction to Computing for Data Analysis → Edx Machine Learning for Data Science and Analytics → Edx Introduction to NoSQL Databases Solution → Edx Intro to Hadoop and Mapreduce → Coursera [In Sequential order from Top]

Best Blogs and Open Source Community Medium AI Community Freecodecamp Analytics Vidya Official Documentations Github and Stackoverflow Kaggle- Spend 5 hours of a day here Cheat Sheets from Amazon aws

Best Books For Machine/ Deep Learning Data Science Beginners Book Statistics

Overview of Data Science Tools and Packages

Thank You