Introduction to data visualization tools like Tableau and Power BI and Excel

LipikaSharmaShrivast 81 views 18 slides Aug 28, 2024
Slide 1
Slide 1 of 18
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18

About This Presentation

Overview of Data Science Tools and Technologies


Slide Content

An Introduction to Popular Tools, Machine Learning, and Visualization

Replay Introduction to Data Science Tools (R, SQL) SQL Commands and basic Handson command line interface R installation and Handson

Session 6 : Machine Learning Agenda: What is Machine Learning Categories of Machine Learning Common Algorithms Linear Regression Naïve Bayes SVM Decision Tree KNN Random Forest K- Means Clustering Machine Learning Process Real world use case

What is Machine Learning Machine learning is a sub-field of  A rtificial Intelligence  in which computers provide predictions based on patterns learned directly from data without being explicitly programmed to do so. 

Categories of Machine Learning

What is an Algorithm ? Algorithms in machine learning are  mathematical procedures and techniques that allow computers to learn from data, identify patterns, make predictions, or perform tasks without explicit programming

Common Algorithms – Linear Regression Linear regression algorithm shows a linear relationship between a dependent (y) and one or more independent (x) variables, hence called as linear regression. Since linear regression shows the linear relationship, which means it finds how the value of the dependent variable is changing according to the value of the independent variable.

Common Algorithms – Naïve Baye’s Naïve Bayes algorithms calculate the probability that an event will occur, based on the occurrence of a related event

Common Algorithms – SVM The goal of the SVM algorithm is to create the best line or decision boundary that can segregate n-dimensional space into classes so that we can easily put the new data point in the correct category in the future. This best decision boundary is called a hyperplane. SVM chooses the extreme points/vectors that help in creating the hyperplane. These extreme cases are called as support vectors, and hence algorithm is termed as Support Vector Machine

Common Algorithms – Decision Tree It is a tree-structured classifier, where internal nodes represent the features of a dataset, branches represent the decision rules and each leaf node represents the outcome. The decisions or the test are performed on the basis of features of the given dataset. It is a graphical representation for getting all the possible solutions to a problem/decision based on given conditions .

Common Algorithms – KNN K-NN algorithm assumes the similarity between the new case/data and available cases and put the new case into the category that is most similar to the available categories. K-NN algorithm stores all the available data and classifies a new data point based on the similarity. This means when new data appears then it can be easily classified into a well suite category by using K- NN algorithm. It is also called a  lazy learner algorithm  because it does not learn from the training set immediately instead it stores the dataset and at the time of classification, it performs an action on the dataset.

Common Algorithms – Random Forest Random Forest is a classifier that contains a number of decision trees on various subsets of the given dataset and takes the average to improve the predictive accuracy of that dataset. It is based on the concept of ensemble learning, which is a process of combining multiple classifiers to solve a complex problem and to improve the performance of the model.

Common Algorithms – K Means Clustering K-Means Clustering is an  Unsupervised Learning algorithm , which groups the unlabeled dataset into different clusters. Here K defines the number of pre-defined clusters that need to be created in the process, as if K=2, there will be two clusters, and for K=3, there will be three clusters, and so on. It is a centroid-based algorithm, where each cluster is associated with a centroid. The main aim of this algorithm is to minimize the sum of distances between the data point and their corresponding clusters.

Machine Learning Process Step 1: Collect and prepare the data Step 2: Train the model Step 3: Validate the model Step 4: Interpret the results

Categories use case in real world

Q What are the main differences between supervised and unsupervised learning?

Q How does a linear regression model make predictions

Q In what scenarios would you prefer using a Decision Tree over a Random Forest?
Tags