Introduction to Machine Learning by MARK

MRKUsafzai0607 23 views 26 slides Apr 29, 2024
Slide 1
Slide 1 of 26
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26

About This Presentation

Machine Learning


Slide Content

Introduction to ML Presenter: Muhammad Rizwan Khan Usafzai 1

NUMpy Library? NumPy : NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays. Key Features: Array creation and manipulation Mathematical operations on arrays Linear algebra operations Fourier transforms Random number generation Applications: Scientific computing Data analysis and manipulation Machine learning 2

How to install NumPy on Jupyter ? Open the jupyter notebook and type the following code: !pip install numpy Import numpy as np Solve the following code then: n = np.array ((1,2,3)) Print(n) Type of object: Print(type(n)) 3

Open cv Library? OpenCV (Open Source Computer Vision Library): OpenCV is an open-source computer vision and machine learning software library. It provides a wide range of functionalities for real-time computer vision, including image and video processing, object detection, face recognition, and more. Key Features: Image and video I/O Image processing algorithms Object detection and tracking Machine learning algorithms for computer vision tasks Applications: Robotics Augmented reality Surveillance systems Medical image analysis 4

How to install Open CV on Jupyter ? Open the jupyter notebook and type the following code: !pip install opencv -python import cv2 img = cv2.imread("img1.png") cv2.imshow("MRK", img ) cv2.waitKey(10000) cv2 .destroyAllWindows() 5

Matplotlib : Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python. It provides a MATLAB-like interface and supports a wide variety of plots and graphs. Key Features: Line plots, scatter plots, and histograms 2D and 3D plotting Customization of plots Integration with NumPy arrays Applications: Data visualization Scientific plotting Statistical analysis 6

How to install Matplotlib on Jupyter ? Open the jupyter notebook and type the following code: !pip install matplotlib Import matplotlib.pyplot as plt // as means alias (named) import numpy as np xpts = np.array ([0,4]) ypts = np.array ([0,6]) plt.plot ( xpts,ypts ) plt.show () 7

Skimage scikit -image, commonly abbreviated as skimage , is an open-source image processing library for Python. It provides a collection of algorithms for image division, feature extraction, image filtering, and other image processing tasks Image Processing Integration: It seamlessly integrates with other scientific Python libraries such as NumPy , SciPy, and Matplotlib , allowing for efficient image manipulation and analysis. User-Friendly API Community Support: Skimage benefits from an active community of developers and users, 8

Installing scikit -image library: Pip install scikit -image Import skimage from skimage import io # Load an image from a file image = io.imread ('example_image.jpg') # Display the image io.imshow (image) io.show () 9

Pillow Pillow is a Python Imaging Library (PIL) fork, which adds extensive image processing capabilities to Python. It provides support for opening, manipulating, and saving many different image file formats. Image Manipulation: Pillow offers a wide range of image handling functionalities such as resizing, cropping, rotating, filtering, and enhancing images. Image File Support: It supports various image file formats including JPEG, PNG, GIF, etc. making it suitable for handling varied image data. Integration: Pillow seamlessly integrates with other Python libraries such as NumPy and Matplotlib , enabling easy interoperability with scientific computing and data visualization tools. Ease of Use: Pillow provides a simple and intuitive API for working with images, making it accessible to users with varying levels of programming experience. Activeness: Pillow is actively maintained and updated, ensuring compatibility with the latest Python versions and continued support for new features and improvements. 10

Installing Pillow library: Pip install pillow from PIL import Image # Open an image file original_image = Image.open ("example.jpg") # Display basic information about the image print("Original Image Format:", original_image.format ) print("Original Image Size:", original_image.size ) # Resize the image new_size = ( original_image.size [0] // 2, original_image.size [1] // 2) # Reduce size by half resized_image = original_image.resize ( new_size ) 11 # Display new size print("Resized Image Size:", resized_image.size ) # Save the resized image with a new name resized_image.save ("resized_example.jpg") # Close the original and resized images original_image.close () resized_image.close () print("Resized image saved successfully!")

pandas Pandas is a powerful Python library for data manipulation and analysis. It offers data structures and functions to efficiently work with structured data like time series, tabular, and heterogeneous data. Data Structures: Pandas provides two main data structures: Series (1D labeled array) and DataFrame (2D labeled data structure), which offer powerful data manipulation capabilities. Data Handling: It offers functionalities for reading and writing data from various formats like CSV, Excel, SQL databases etc. Data Analysis: Pandas supports data analysis tasks including data cleaning, filtering, grouping, merging, and reshaping, making it indispensable for exploratory data analysis. Integration: It seamlessly integrates with other Python libraries such as NumPy , Matplotlib , and scikit -learn, enhancing its capabilities in scientific computing and machine learning tasks. 12

Installing Pandas library: Pip install pandas Some time it shows for pip upgrade then use the following to upgrade your pip: Python.exe -m pip install --upgrade pip import pandas as pd # Read a CSV file into a DataFrame df = pd.read_csv ("example.csv") # Display the first few rows of the DataFrame print("First few rows of the DataFrame :") print( df.head ()) 13 # Display summary information about the DataFrame print("\ nSummary information:") print(df.info()) # Display basic statistics of numerical columns print("\ nBasic statistics:") print( df.describe ())

scikit -learn Definition: scikit -learn is a versatile machine learning library for Python. It offers simple and efficient tools for data mining and data analysis, implementing a wide range of machine learning algorithms. Machine Learning Algorithms: scikit -learn provides implementations for various machine learning algorithms including classification, regression, clustering, dimensionality reduction, and model selection. Model Evaluation: It offers tools for model evaluation, cross-validation, and hyperparameter tuning, facilitating the development of robust and accurate machine learning models. Integration: scikit -learn seamlessly integrates with other Python libraries such as NumPy , SciPy, and Pandas, enabling easy preprocessing, training, and evaluation of machine learning models. Scalability: It is designed to be scalable and efficient, making it suitable for working with large datasets and complex models. 14

Installing scikit -learn library: Pip install scikit -learn Import sklearn from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.ensemble import RandomForestClassifier from sklearn.metrics import accuracy_score , classification_report # Load the Iris dataset iris = load_iris () X = iris.data # Features y = iris.target # Target variable # Split the dataset into training and testing sets X_train , X_test , y_train , y_test = train_test_split (X, y, test_size =0.2, random_state =42) 15 # Initialize the Random Forest classifier rf_classifier = RandomForestClassifier ( n_estimators =100, random_state =42) # Train the classifier rf_classifier.fit ( X_train , y_train ) # Predict on the test set y_pred = rf_classifier.predict ( X_test ) # Evaluate the model accuracy = accuracy_score ( y_test , y_pred ) print("Accuracy:", accuracy) # Display classification report print("\ nClassification Report:") print( classification_report ( y_test , y_pred , target_names = iris.target_names ))

Seaborn Seaborn is a Python library for creating attractive statistical graphics. Statistical Visualization: Seaborn excels in generating plots like scatter plots, bar charts, and heatmaps for effective data exploration. Integration with Pandas: It seamlessly works with Pandas DataFrames , making data visualization straightforward. Customization: Users can easily customize plot aesthetics to suit their preferences. Statistical Analysis: Seaborn offers tools for visualizing relationships between variables and conducting statistical analysis. Community and Documentation: Supported by an active community and comprehensive documentation for easy learning. 16

Installing seaborn library: Pip install seaborn import seaborn as sns import matplotlib.pyplot as plt from sklearn.datasets import load_iris # Load the Iris dataset iris = load_iris () iris_df = sns.load_dataset ("iris") # Load Iris dataset as a DataFrame # Create a pairplot using Seaborn sns.pairplot ( iris_df , hue='species', palette='Set1') # Add title plt.suptitle (" Pairplot of Iris Dataset") # Show the plot plt.show () 17

Plotly Plotly is a Python library for creating interactive and publication-quality graphs. Interactive Visualization: Plotly allows users to interactively explore data through zooming and hovering over data points. Online Platform: It offers an online platform for hosting and sharing interactive plots. Chart Types: Supports a wide range of chart types including scatter plots, line plots, and 3D surface plots. Integration: Easily integrates with other Python libraries for seamless data manipulation and visualization. Customization: Provides extensive options for customizing plot appearance for tailored visualizations. 18

Installing plotly library: Pip install plotly import plotly.graph_objects as go # Sample data x_values = [1, 2, 3, 4, 5] y_values = [2, 3, 5, 7, 11] # Create a line plot fig = go.Figure (data= go.Scatter (x= x_values , y= y_values , mode='lines')) # Add title and axis labels fig.update_layout (title='Simple Line Plot', xaxis_title ='X-axis', yaxis_title ='Y-axis') # Show the plot fig.show () 19

Data Pre Processing: Data preprocessing is a critical step in machine learning pipelines. It is define as the techniques and procedures used to prepare raw data for analysis. It involves several tasks such as importing and exporting data, cleaning and formatting data, handling missing values, and feature scaling. 20 Importing and Exporting Data : Importing data involves loading datasets into the machine learning environment. This can be done using libraries like Pandas in Python or functions like read_csv () for CSV files, read_excel () for Excel files, etc. import pandas as pd df = pd.read_csv (‘ML.csv’) df.shape #show number of rows and columns d f.describe () #calculate the SD, mean etc.

Exporting the Data : import pandas as pd # Example DataFrame data = { 'Name': ['John', 'Alice', 'Bob'], 'Age': [25, 30, 35], 'City': ['New York', 'Los Angeles', 'Chicago'] } df = pd.DataFrame (data) # Export DataFrame to CSV df.to_csv ('output.csv', index=False) 21

Cleaning and Formatting Data : Cleaning data involves identifying and handling anomalies, inconsistencies, and errors in the dataset. This may include removing duplicates, correcting data types, dealing with outliers, etc. Formatting data involves ensuring that data is in the appropriate format for analysis. For example, converting categorical variables into numerical representations, standardizing date formats, etc. 22

import pandas as pd # Load the dataset data = { 'Name': ['John', 'Alice', 'Bob', 'Anna', 'Mike', 'Emily'], 'Age': [25, 30, None, 35, 40, ''], 'City': ['New York', 'Los Angeles', 'Chicago', 'San Francisco', '', 'Seattle'], 'Gender': ['Male', 'Female', 'Male', '', 'Male', 'Female'], 'Salary': ['$50000', '$60000', '$70000', '$80000', '90000', '$100000'] } df = pd.DataFrame (data) # Display the original DataFrame print("Original DataFrame :") print( df ) print() # Clean and format the data # 1. Convert Age to numeric and fill missing values with the median age df ['Age'] = pd.to_numeric ( df ['Age'], errors='coerce') 23 median_age = df ['Age'].median() # Calculate median age df ['Age']. fillna ( median_age , inplace =True) # Fill missing values with median # 2. Remove rows with missing or empty values in City and Gender columns df = df [ df ['City']. notna () & df ['Gender']. notna () & ( df ['Gender'] != '')] # 3. Convert Salary to numeric and remove dollar signs df ['Salary'] = df ['Salary'].replace('[\$,]', '', regex=True). astype (float) # Display the cleaned and formatted DataFrame print("Cleaned and Formatted DataFrame :") print( df )

Handling Missing Values : Missing values are common in datasets and can significantly affect the performance of machine learning models if not handled properly. Techniques for handling missing values include: Imputation: Replacing missing values with a calculated or estimated value (e.g., mean, median, mode). Deletion: Removing rows or columns with missing values. Advanced techniques like predictive modeling to estimate missing values based on other features. The example is same as previous. 24

Feature Scaling : Feature scaling is the process of standardizing or normalizing the range of independent variables or features in the dataset. It is essential for algorithms that are sensitive to the scale of the input features, such as gradient descent-based algorithms (e.g., linear regression, logistic regression) or distance-based algorithms (e.g., k-nearest neighbors, support vector machines). Common techniques for feature scaling include: Min-Max Scaling: Scaling features to a fixed range, usually [0, 1]. Standardization (Z-score normalization): Scaling features so that they have the properties of a standard normal distribution with a mean of 0 and a standard deviation of 1. Robust Scaling: Scaling features using statistics that are robust to outliers, such as the median and interquartile range. 25

Feature Scaling : import numpy as np from sklearn.preprocessing import MinMaxScaler , StandardScaler # Sample dataset with two features data = np.array ([[10, 0.5], [20, 0.7], [30, 0.9]]) # Min-Max Scaling scaler_minmax = MinMaxScaler () # Initialize MinMaxScaler data_minmax = scaler_minmax.fit_transform (data) # Perform Min-Max Scaling print("Min-Max Scaled Data:") print( data_minmax ) print() # Standardization (Z-score normalization) scaler_standard = StandardScaler () # Initialize StandardScaler data_standard = scaler_standard.fit_transform (data) # Perform Standardization print("Standardized Data:") print( data_standard ) 26
Tags