Data Science Roadmap by Swapnil Microsoft

geekism12 104 views 24 slides Jun 07, 2024
Slide 1
Slide 1 of 24
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24

About This Presentation

DS


Slide Content

Roadmap to Data Science by SWAPNIL NARAYAN Microsoft | IIT | Hacker Cup

About the Instructor Hey there, I’m Swapnil Narayan, a graduate from IIT(ISM) Dhanbad with Computer Science majors. I’m a Software Engineer at Microsoft India, and have also got offers from Amazon and Oracle for Software Engineering roles. I’m a very passionate Competitive Programming Instructor and have a decent experience for the same at various popular edtech platforms, and have taken sessions with IITs, NITs, and other engineering colleges . I will be your mentor for this session and will walk you through the topics the following slides.

What is Data Science?? Data Science as a multi-disciplinary subject encompasses the use of mathematics, statistics, and computer science to study and evaluate data. The key objective of Data Science is to extract valuable information for use in strategic decision making, product development, trend analysis and forecasting. A Data scientist is sort of ' jack-of-all-trades ' for data crunching. Basically, 3 main skills a data scientist needs to possess are mathematics/statistics, computer programming literacy and knowledge of particular business.

Data Science is a Broader Field

Comparison between Different Roles in 2018

How to become a Data Scientist?? Math Programming Languages Data Wrangling and Management Data Analysis and Visualization Machine Learning Deep Learning

Mathematics Linear Algebra: Matrix, Eigen, Tensor etc. Calculus: Differentiation and Integration. Probability: Bayes Theorem, Optimization etc. Statistics: Inferential Statistics, Descriptive Statistics, Chi squared Testes, Random Variable, Gaussian And Normal Distribution.

Programming Languages Python: It is the Bible. Easy to understand, i.e., plane English No semicolon Simple and tons of libraries available Talk about Packages Data visualization using ggplot2, tidy are extremely important

Libraries

Data Wrangling and Management Data Mining Data Cleaning Data Management Relevant Skills: MySQL: RDBMS NoSQL: Mongo DB, Cassandra etc. JOIN

Data Analysis and Visualization Plotting libraries in programming languages, e.g., plotly, matplotlib, seaborn → python ggplot2 → R Tableau and PowerBI is booming now. [Pandas and Numpy for Data Analysis]

Machine Learning and Deep Learning Domain Knowledge??? HEALTHCARE, BUSINESS, FINANCE, SPORTS etc. Supervised Unsupervised Reinforcement

Machine Learning Algorithms Topics: Regression, Decision Tree, Random Forest, Naïve Bayes, Ensemble Learning, AdaBoost, Hierarchical Clustering, Association, k-means Clustering, SVM, KNN, Gradient Descent, Cross Validation, Entropy, Accuracy, Precision, Collaborative Filtration, PCA, Markov model, Boltzmann theorem etc. Testing Evaluation and Validation of Models

Deep Learning Algorithms Neural Networks, Feed Forward NN, Fuzzy Logic, Sequence Model, LSTM, RNN, CNN, CapsNet, Time Series etc

Big Data Map Reduce Hadoop Apache Spark Hive Pig Mahout Yarn

Additional Skills NLP CV

Learning Outcomes Build artificial neural networks with Tensorflow and Keras Build Deep Learning networks to classify images with Convolutional Neural Networks Implement machine learning, clustering, and search using TF/IDF at massive scale with Apache Spark's MLLib Implement Sentiment Analysis with Recurrent Neural Networks Understand reinforcement learning - and how to build a Pac-Man bot

Make predictions using linear regression, polynomial regression, and multivariate regression Implement Sentiment Analysis with Recurrent Neural Networks Understand reinforcement learning - and how to build a Pac-Man bot Classify medical test results with a wide variety of supervised machine learning classification techniques Cluster data using K-Means clustering and Support Vector Machines (SVM)

Build a spam classifier using Naive Bayes Use decision trees to predict hiring decisions Apply dimensionality reduction with Principal Component Analysis (PCA) to classify flowers Predict classifications using K-Nearest-Neighbor (KNN) Develop using iPython notebooks Understand statistical measures such as standard deviation Visualize data distributions, probability mass functions, and probability density functions Visualize data with matplotlib

Use covariance and correlation metrics Apply conditional probability for finding correlated features Use Bayes' Theorem to identify false positives Understand complex multi-level models Use train/test and K-Fold cross validation to choose the right model Build a movie recommender system using item-based and user-based collaborative filtering Clean your input data to remove outliers Design and evaluate A/B tests using T-Tests and P-Values

Best Blogs and Open Source Community Medium AI Community Official Documentations Github and Stackoverflow Kaggle- Spend 5 hours of a day here Cheat Sheets from Amazon aws

Best Books For Machine/ Deep Learning Data Science Beginners Book Statistics

Overview of Data Science Tools and Packages

Thank You
Tags