Machine learning for complete beginners.ppt

hyliuqd 12 views 11 slides Sep 10, 2024
Slide 1
Slide 1 of 11
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11

About This Presentation

Machine learning slides


Slide Content

Introduction to
Machine Learning
For
Complete Beginners
pythonforengineers.com

Steps to machine learning
Gather Data Clean Data
Prepare input
for ML
Machine Learning
Algorithm
Test model on
real
data
ML model
Visualise Data

Gathering Data

Depending on the use case, this might be the
hardest part!

Data may have to be scraped from websites, or
manually collected (by doing surveys, or taking
measurements in a lab).

Data maybe spread over hundreds of files, in a
haphazard format

Clean the data

Even when you gather the data, it may not be
easily usable

Missing fields, data in different formats (inches
vs centimeter)

I have seen the same file have dates in 3
different formats: dd-mm-yy, mm-dd-yy and yy-
mm-dd

The data has to be made consistent and clear

Visualise Data

You do NOT need machine learning algorithms!

Sometimes, just visualising the data will show
you insights

Made up example:

Why did account cancellation jump in January?
What did we change in the service in that time?
November December January Feb
0
1
2
3
4
5
6
7
8
9
10
Cancellations of Accounts
Num Cancel

Preparing for machine learning
We need to choose which inputs we will use for
our learning, and what the expected output is
Machine Learning
Algorithm
Inputs
Expected output
model

Example

Titanic dataset contains: Name, age, address
etc.

Are all these fields useful?

What are the inputs?

What is the expected output?

Problems we will face

Overfitting
The algorithm does an excellent job of prediction.
But it only works on our test data

The algorithm has only learnt how to predict
with our exact data

Like Astrologers!!

Solutions

The test data is divided into a training and test
section

Only the training set is used to train the
algorithm

The test set is then used to check if the model
works for unseen data (as we know what the
expected output is for the test data)

Problem: The amount of data the algorithm has
is reduced

Engineering is about compromises

Your assignment

Look at dataset

Which fields will you be choosing?
Tags