Data Preprocessing Introduction for Machine Learning

sonalisonavane 44 views 40 slides Jul 18, 2024
Slide 1
Slide 1 of 40
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40

About This Presentation

What is data preprocessing in ML


Slide Content

Data Pre processing SY Btech Sem:III

What is Data Preprocessing ? Data preprocessing is a process of preparing the raw data and making it suitable for a machine learning model. It is the first and crucial step while creating a machine learning model.

Why do we need Data Preprocessing? data generally contains noises, missing values, unusable format tasks for cleaning the data and making it suitable for a machine learning model increasing the accuracy and efficiency of a machine learning model.

Steps in Data Pre processing Getting the dataset Importing libraries Importing datasets Finding Missing Data Encoding Categorical Data Splitting dataset into training and test set Feature scaling

Python Libraries for Data Preprocessing NumPy Pandas Matplotlib

NumPy : Numerical Python NumPy is used for working with arrays . It also has functions for working in domain of linear algebra, fourier transform, and matrices . NumPy was created in 2005 by Travis Oliphant. It is an open source project and we can use it freely.

Import NumPy import  numpy import   numpy as np import numpy arr = numpy.array ([1, 2, 3, 4, 5]) print( arr ) import  numpy as np arr = numpy.array ([1, 2, 3, 4, 5]) print( arr )

Create a NumPy ndarray Object The array object in NumPy is called  ndarray . We can create a NumPy   ndarray  object by using the array() function. import numpy as np arr = np.array([1, 2, 3, 4, 5]) print( arr ) print(type( arr ))

Dimensions in Arrays 0-D Arrays 1-D Arrays import numpy as np arr = np.array(42) print( arr ) import numpy as np arr = np.array ([1, 2, 3, 4, 5]) print( arr )

Array cont … 2-D Arrays 3-D arrays import numpy as np arr = np.array([[1, 2, 3], [4, 5, 6]]) print( arr ) import numpy as np arr = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]]) print( arr )

Check Number of Dimensions? NumPy Arrays provides the  ndim  attribute that returns an integer that tells us how many dimensions the array have . import  numpy  as np a = np.array (42) b = np.array ([1, 2, 3, 4, 5]) c = np.array ([[1, 2, 3], [4, 5, 6]]) d = np.array ([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]]) print( a.ndim ) print( b.ndim ) print( c.ndim ) print( d.ndim )

NumPy  Array Indexing import numpy as np arr = np.array([1, 2, 3, 4]) print( arr [0 ]) import numpy as np arr = np.array([1, 2, 3, 4]) print( arr [2 ] +  arr [3])

Cont … import numpy as np arr = np.array([[1,2,3,4,5], [6,7,8,9,10]]) print ('2nd element on 1st row: ', arr [0, 1]) import numpy as np arr = np.array([[1,2,3,4,5], [6,7,8,9,10]]) print ('5th element on 2nd row: ', arr [1, 4])

Cont … import numpy as np arr = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]]) print( arr [0 , 1, 2]) import numpy as np arr = np.array([[1,2,3,4,5], [6,7,8,9,10]]) print ('Last element from 2nd dim: ', arr [1, -1])

Arrays, creation np.ones , np.zeros np.arange np.concatenate np.astype np.zeros_like , np.ones_like np.random.random 15

Arrays, creation np.ones , np.zeros np.arange np.concatenate np.astype np.zeros_like , np.ones_like np.random.random 16

Arrays, creation np.ones , np.zeros np.arange np.concatenate np.astype np.zeros_like , np.ones_like np.random.random 17

Arrays, creation np.ones , np.zeros np.arange np.concatenate np.astype np.zeros_like , np.ones_like np.random.random 18

Arrays, creation np.ones , np.zeros np.arange np.concatenate np.astype np.zeros_like , np.ones_like np.random.random 19

Arrays, creation np.ones , np.zeros np.arange np.concatenate np.astype np.zeros_like , np.ones_like np.random.random 20

Arrays, creation np.ones , np.zeros np.arange np.concatenate np.astype np.zeros_like , np.ones_like np.random.random 21

Arrays, creation np.ones , np.zeros np.arange np.concatenate np.astype np.zeros_like , np.ones_like np.random.random 22

Arrays, danger zone Must be dense, no holes. Must be one type Cannot combine arrays of different shape 23

Slicing arrays taking elements from one given index to another given index . [ start : end ] [ start : end : step ] If we don't pass start its considered 0 If we don't pass end its considered length of array in that dimension If we don't pass step its considered 1

import  numpy  as np arr = np.array ([1, 2, 3, 4, 5, 6, 7]) print( arr [1:5 ]) import  numpy  as np arr = np.array ([1, 2, 3, 4, 5, 6, 7]) print( arr [4 :]) import  numpy  as np arr = np.array ([1, 2, 3, 4, 5, 6, 7]) print( arr [:4])

import  numpy  as np arr = np.array ([1, 2, 3, 4, 5, 6, 7]) print( arr [-3:-1]) import  numpy  as np arr = np.array ([1, 2, 3, 4, 5, 6, 7]) print( arr [1:5:2 ])

import numpy as np arr = np.array ([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]]) print( arr [1 , 1:4]) import  numpy  as np arr = np.array ([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]]) print( arr [0:2 , 1:4])

Data Types in NumPy strings  - used to represent text data, the text is given under quote marks. e.g. "ABCD" integer  - used to represent integer numbers. e.g. -1, -2, -3 float  - used to represent real numbers. e.g. 1.2, 42.42 boolean   - used to represent True or False. complex  - used to represent complex numbers. e.g. 1.0 + 2.0j, 1.5 + 2.5j

Cont … import numpy as np arr = np.array ([1, 2, 3, 4], dtype ='i4') print( arr ) print( arr.dtype ) import  numpy  as np arr = np.array ([1.1, 2.1, 3.1]) newarr = arr.astype ( int ) print( newarr ) print( newarr.dtype )

NumPy  Array Shape/Reshape import  numpy  as np arr = np.array ([[1, 2, 3, 4], [5, 6, 7, 8]]) print( arr.shape ) import  numpy  as np arr = np.array ([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]) newarr = arr.reshape (2, 3, 2) print( newarr )

NumPy  Array Iterating import  numpy  as np arr = np.array ([[1, 2, 3], [4, 5, 6]]) for  x in  arr :   print(x) import   numpy  as np arr = np.array ([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]]) for  x in  arr :   for y in x:     for z in y:       print(z)

Iterating Arrays Using nditer () import  numpy  as np arr = np.array ([[[1, 2], [3, 4]], [[5, 6], [7, 8]]]) for  x in  np.nditer ( arr ):   print(x) import   numpy  as np arr = np.array ([1, 2, 3]) for   idx , x in  np.ndenumerate ( arr ):   print( idx , x)

Joining NumPy Arrays import  numpy  as np arr1 = np.array ([1, 2, 3]) arr2 = np.array ([4, 5, 6]) arr = np.concatenate ((arr1, arr2)) print( arr ) import  numpy  as np arr1 = np.array ([[1, 2], [3, 4]]) arr2 =  np.array ([[5, 6], [7, 8]]) arr = np.concatenate ((arr1, arr2), axis=1) print( arr )

Joining Arrays Using Stack Functions Stacking Along Rows import numpy as np arr1 = np.array ([1, 2, 3]) arr2 = np.array ([4, 5, 6]) arr = np.stack ((arr1, arr2), axis=1) print( arr ) import  numpy  as np arr1 = np.array ([1, 2, 3]) arr2 = np.array ([4, 5, 6]) arr = np.hstack ((arr1, arr2)) print( arr )

Stacking Along Columns Stacking Along Height (depth) import  numpy  as np arr1 = np.array ([1, 2, 3]) arr2 = np.array ([4, 5, 6]) arr = np.vstack ((arr1, arr2)) print( arr ) import  numpy  as np arr1 = np.array ([1, 2, 3]) arr2 = np.array ([4, 5, 6]) arr = np.dstack ((arr1, arr2)) print( arr )

Splitting NumPy Arrays import  numpy  as np arr = np.array ([1, 2, 3, 4, 5, 6]) newarr =  np.array_split ( arr , 3) print( newarr )

NumPy  Searching Arrays import numpy as np arr = np.array ([1, 2, 3, 4, 5, 4, 4]) x = np.where ( arr == 4) print(x ) import  numpy  as np arr = np.array ([1, 2, 3, 4, 5, 6, 7, 8]) x =  np.where (arr%2 == 0) print(x )

Sorting Arrays import  numpy  as np arr = np.array ([3, 2, 0, 1]) print( np.sort ( arr )) import  numpy  as np arr = np.array (['banana', 'cherry', 'apple']) print( np.sort ( arr ))

Random Numbers in NumPy What is a Random Number? Random means something that can not be predicted logically . Generate Random Number from  numpy  import random x = random.randint (100) print(x )

Generate Random Float Generate Random Array x = random.randint (100, size=(3, 5 )) x = random.rand (3, 5 ) x = random.choice ([3, 5, 7, 9]) from  numpy  import random x = random.rand () print(x ) from  numpy  import random x= random.randint (100 , size=(5)) print(x )
Tags