Data Preprocessing Introduction for Machine Learning
sonalisonavane
44 views
40 slides
Jul 18, 2024
Slide 1 of 40
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
About This Presentation
What is data preprocessing in ML
Size: 217.96 KB
Language: en
Added: Jul 18, 2024
Slides: 40 pages
Slide Content
Data Pre processing SY Btech Sem:III
What is Data Preprocessing ? Data preprocessing is a process of preparing the raw data and making it suitable for a machine learning model. It is the first and crucial step while creating a machine learning model.
Why do we need Data Preprocessing? data generally contains noises, missing values, unusable format tasks for cleaning the data and making it suitable for a machine learning model increasing the accuracy and efficiency of a machine learning model.
Steps in Data Pre processing Getting the dataset Importing libraries Importing datasets Finding Missing Data Encoding Categorical Data Splitting dataset into training and test set Feature scaling
Python Libraries for Data Preprocessing NumPy Pandas Matplotlib
NumPy : Numerical Python NumPy is used for working with arrays . It also has functions for working in domain of linear algebra, fourier transform, and matrices . NumPy was created in 2005 by Travis Oliphant. It is an open source project and we can use it freely.
Create a NumPy ndarray Object The array object in NumPy is called ndarray . We can create a NumPy ndarray object by using the array() function. import numpy as np arr = np.array([1, 2, 3, 4, 5]) print( arr ) print(type( arr ))
Check Number of Dimensions? NumPy Arrays provides the ndim attribute that returns an integer that tells us how many dimensions the array have . import numpy as np a = np.array (42) b = np.array ([1, 2, 3, 4, 5]) c = np.array ([[1, 2, 3], [4, 5, 6]]) d = np.array ([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]]) print( a.ndim ) print( b.ndim ) print( c.ndim ) print( d.ndim )
Arrays, danger zone Must be dense, no holes. Must be one type Cannot combine arrays of different shape 23
Slicing arrays taking elements from one given index to another given index . [ start : end ] [ start : end : step ] If we don't pass start its considered 0 If we don't pass end its considered length of array in that dimension If we don't pass step its considered 1
Data Types in NumPy strings - used to represent text data, the text is given under quote marks. e.g. "ABCD" integer - used to represent integer numbers. e.g. -1, -2, -3 float - used to represent real numbers. e.g. 1.2, 42.42 boolean - used to represent True or False. complex - used to represent complex numbers. e.g. 1.0 + 2.0j, 1.5 + 2.5j
NumPy Array Iterating import numpy as np arr = np.array ([[1, 2, 3], [4, 5, 6]]) for x in arr : print(x) import numpy as np arr = np.array ([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]]) for x in arr : for y in x: for z in y: print(z)
Iterating Arrays Using nditer () import numpy as np arr = np.array ([[[1, 2], [3, 4]], [[5, 6], [7, 8]]]) for x in np.nditer ( arr ): print(x) import numpy as np arr = np.array ([1, 2, 3]) for idx , x in np.ndenumerate ( arr ): print( idx , x)
Random Numbers in NumPy What is a Random Number? Random means something that can not be predicted logically . Generate Random Number from numpy import random x = random.randint (100) print(x )
Generate Random Float Generate Random Array x = random.randint (100, size=(3, 5 )) x = random.rand (3, 5 ) x = random.choice ([3, 5, 7, 9]) from numpy import random x = random.rand () print(x ) from numpy import random x= random.randint (100 , size=(5)) print(x )