Roadmap to Data Science by SWAPNIL NARAYAN Microsoft | IIT | Hacker Cup
About the Instructor Hey there, I’m Swapnil Narayan, a graduate from IIT(ISM) Dhanbad with Computer Science majors. I’m a Software Engineer at Microsoft India, and have also got offers from Amazon and Oracle for Software Engineering roles. I’m a very passionate Competitive Programming Instructor and have a decent experience for the same at various popular edtech platforms, and have taken sessions with IITs, NITs, and other engineering colleges . I will be your mentor for this session and will walk you through the topics the following slides.
What is Data Science?? Data Science as a multi-disciplinary subject encompasses the use of mathematics, statistics, and computer science to study and evaluate data. The key objective of Data Science is to extract valuable information for use in strategic decision making, product development, trend analysis and forecasting. A Data scientist is sort of ' jack-of-all-trades ' for data crunching. Basically, 3 main skills a data scientist needs to possess are mathematics/statistics, computer programming literacy and knowledge of particular business.
Data Science is a Broader Field
Comparison between Different Roles in 2018
How to become a Data Scientist?? Math Programming Languages Data Wrangling and Management Data Analysis and Visualization Machine Learning Deep Learning
Mathematics Linear Algebra: Matrix, Eigen, Tensor etc. Calculus: Differentiation and Integration. Probability: Bayes Theorem, Optimization etc. Statistics: Inferential Statistics, Descriptive Statistics, Chi squared Testes, Random Variable, Gaussian And Normal Distribution.
Programming Languages Python: It is the Bible. Easy to understand, i.e., plane English No semicolon Simple and tons of libraries available Talk about Packages Data visualization using ggplot2, tidy are extremely important
Libraries
Data Wrangling and Management Data Mining Data Cleaning Data Management Relevant Skills: MySQL: RDBMS NoSQL: Mongo DB, Cassandra etc. JOIN
Data Analysis and Visualization Plotting libraries in programming languages, e.g., plotly, matplotlib, seaborn → python ggplot2 → R Tableau and PowerBI is booming now. [Pandas and Numpy for Data Analysis]
Machine Learning and Deep Learning Domain Knowledge??? HEALTHCARE, BUSINESS, FINANCE, SPORTS etc. Supervised Unsupervised Reinforcement
Deep Learning Algorithms Neural Networks, Feed Forward NN, Fuzzy Logic, Sequence Model, LSTM, RNN, CNN, CapsNet, Time Series etc
Big Data Map Reduce Hadoop Apache Spark Hive Pig Mahout Yarn
Additional Skills NLP CV
Learning Outcomes Build artificial neural networks with Tensorflow and Keras Build Deep Learning networks to classify images with Convolutional Neural Networks Implement machine learning, clustering, and search using TF/IDF at massive scale with Apache Spark's MLLib Implement Sentiment Analysis with Recurrent Neural Networks Understand reinforcement learning - and how to build a Pac-Man bot
Make predictions using linear regression, polynomial regression, and multivariate regression Implement Sentiment Analysis with Recurrent Neural Networks Understand reinforcement learning - and how to build a Pac-Man bot Classify medical test results with a wide variety of supervised machine learning classification techniques Cluster data using K-Means clustering and Support Vector Machines (SVM)
Build a spam classifier using Naive Bayes Use decision trees to predict hiring decisions Apply dimensionality reduction with Principal Component Analysis (PCA) to classify flowers Predict classifications using K-Nearest-Neighbor (KNN) Develop using iPython notebooks Understand statistical measures such as standard deviation Visualize data distributions, probability mass functions, and probability density functions Visualize data with matplotlib
Use covariance and correlation metrics Apply conditional probability for finding correlated features Use Bayes' Theorem to identify false positives Understand complex multi-level models Use train/test and K-Fold cross validation to choose the right model Build a movie recommender system using item-based and user-based collaborative filtering Clean your input data to remove outliers Design and evaluate A/B tests using T-Tests and P-Values
Best Blogs and Open Source Community Medium AI Community Official Documentations Github and Stackoverflow Kaggle- Spend 5 hours of a day here Cheat Sheets from Amazon aws
Best Books For Machine/ Deep Learning Data Science Beginners Book Statistics