Introduction to data visualization tools like Tableau and Power BI and Excel
LipikaSharmaShrivast
81 views
18 slides
Aug 28, 2024
Slide 1 of 18
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
About This Presentation
Overview of Data Science Tools and Technologies
Size: 3.42 MB
Language: en
Added: Aug 28, 2024
Slides: 18 pages
Slide Content
An Introduction to Popular Tools, Machine Learning, and Visualization
Replay Introduction to Data Science Tools (R, SQL) SQL Commands and basic Handson command line interface R installation and Handson
Session 6 : Machine Learning Agenda: What is Machine Learning Categories of Machine Learning Common Algorithms Linear Regression Naïve Bayes SVM Decision Tree KNN Random Forest K- Means Clustering Machine Learning Process Real world use case
What is Machine Learning Machine learning is a sub-field of A rtificial Intelligence in which computers provide predictions based on patterns learned directly from data without being explicitly programmed to do so.
Categories of Machine Learning
What is an Algorithm ? Algorithms in machine learning are mathematical procedures and techniques that allow computers to learn from data, identify patterns, make predictions, or perform tasks without explicit programming
Common Algorithms – Linear Regression Linear regression algorithm shows a linear relationship between a dependent (y) and one or more independent (x) variables, hence called as linear regression. Since linear regression shows the linear relationship, which means it finds how the value of the dependent variable is changing according to the value of the independent variable.
Common Algorithms – Naïve Baye’s Naïve Bayes algorithms calculate the probability that an event will occur, based on the occurrence of a related event
Common Algorithms – SVM The goal of the SVM algorithm is to create the best line or decision boundary that can segregate n-dimensional space into classes so that we can easily put the new data point in the correct category in the future. This best decision boundary is called a hyperplane. SVM chooses the extreme points/vectors that help in creating the hyperplane. These extreme cases are called as support vectors, and hence algorithm is termed as Support Vector Machine
Common Algorithms – Decision Tree It is a tree-structured classifier, where internal nodes represent the features of a dataset, branches represent the decision rules and each leaf node represents the outcome. The decisions or the test are performed on the basis of features of the given dataset. It is a graphical representation for getting all the possible solutions to a problem/decision based on given conditions .
Common Algorithms – KNN K-NN algorithm assumes the similarity between the new case/data and available cases and put the new case into the category that is most similar to the available categories. K-NN algorithm stores all the available data and classifies a new data point based on the similarity. This means when new data appears then it can be easily classified into a well suite category by using K- NN algorithm. It is also called a lazy learner algorithm because it does not learn from the training set immediately instead it stores the dataset and at the time of classification, it performs an action on the dataset.
Common Algorithms – Random Forest Random Forest is a classifier that contains a number of decision trees on various subsets of the given dataset and takes the average to improve the predictive accuracy of that dataset. It is based on the concept of ensemble learning, which is a process of combining multiple classifiers to solve a complex problem and to improve the performance of the model.
Common Algorithms – K Means Clustering K-Means Clustering is an Unsupervised Learning algorithm , which groups the unlabeled dataset into different clusters. Here K defines the number of pre-defined clusters that need to be created in the process, as if K=2, there will be two clusters, and for K=3, there will be three clusters, and so on. It is a centroid-based algorithm, where each cluster is associated with a centroid. The main aim of this algorithm is to minimize the sum of distances between the data point and their corresponding clusters.
Machine Learning Process Step 1: Collect and prepare the data Step 2: Train the model Step 3: Validate the model Step 4: Interpret the results
Categories use case in real world
Q What are the main differences between supervised and unsupervised learning?
Q How does a linear regression model make predictions
Q In what scenarios would you prefer using a Decision Tree over a Random Forest?