Iris - Most loved dataset

DrAsmitaTitre 4,596 views 29 slides Jul 09, 2018
Slide 1
Slide 1 of 29
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29

About This Presentation

Iris dataset


Slide Content

Iris Dr Asmita Titre

About Iris Dataset Best known database to be found in the pattern recognition literature. Data set- Iris flower data set(Donated date - 1988-07-01), also known as Fisher's Iris data set and Anderson's Iris data set b/c Edgar Anderson collected the data. It is multivariate(more than 2 dependent variable) data set Study of three related Iris flowers species. Data set contain 50 sample of each species(Iris- Setosa , Iris- Virginica , Iris- Versicolor )

Features or attributes (4 numeric)- Sepal length in cm Sepal width in cm Petal length in cm Petal width in cm

More on data set- One class is linearly separable from the other 2; the latter are NOT linearly separable from each other Missing Attributes Values : None

Summary Statistics Class Distribution: 33.3% for each of 3 classes.

Iris Virginica , Iris Versicolor , Iris setosa

Objective Classify a new flower as belonging to one of the 3 classes given the 4 features

Lets Be Sherlock Holmes 

What is data saying ? ( Exploratory data analysis). We will try to find the answer of the following questions with the help of all available asset

1. Descriptive statistics- SD, Min, Max etc 2 . Class Distribution (Species counts are balanced or imbalanced) – Balanced 3 . Univariate Plots:- Understand each attribute better.

3.1 Box Plot - Distribution of attribute through their quartiles & find outlier # Box and whisker plots(Give idea about distribution of input attributes)

3.2 Histogram -

3.2 Histogram 3.2 Distribution of attribute through their bin, we find the distribution of attribute follow Gaussian or other distributions

4. Multivariate Plots Understand the relationships between attributes & species better. (Which attributes contributes a lot in classifying species)

4.0 Scatter Plot - Sepal_Length_Width Vs Species.

Observations- 1.Using Sepal_Lenght & Sepal_Width features, we can only distinguish Setosa flower from others 2.Seperating Versicolor & Virginica is much harder as they have considerable overlap 3.Hence, Sepal_Lenght & Sepal_Width features only work well for Setosa

4.1 Scatter Plot - Petal_Length_Width Vs Species.

Observations- 1.Using Petal_Lenght & Petal_Width features, we can distinguish Setosa , Versicolor & Virginica fairly 2.There are slightly overlap of Versicolor & Virginica . 3.Graph shows that Petal (Length and Width) features are best contributor for Iris Species as compare to Sepal (Length and Width)

4.2 Scatter Plot of all the attributes

4.3 3D Plot

4.4 Violinplot - Density of the length and width in the species

Machine Learning (Here comes the beauty of machine)

Steps to implement the ML 1 Import Library 2 Create Correlation Matrix 3 Spliting the Data Set But keep in mind;

While Spliting the Data Set 3.1 Take all the data features 3.2 Take only Sepal Features(Length & Width) 3.3 Take only Petal Features(Length & Width) 3.4 Take all relevant Features from correlation Matrix

Algorithms used 4 Evaluate by using 6 different Algorithms(Cross Validation) Here, 4.1 Logistic Regression (LR) 4.2 Linear Discriminant Analysis(LDA) 4.3 K-Nearest Neighbour (KNN) 4.4 Classification and Regression Tree(CART) 4.5 Gaussion Naive Bayes (NB) 4.6 Support Vector Machine

5 Final Evalution (Compare all model according to features selection and accuracy) 6 Deep Learning

Final Evaluation Note:- Testing dataset size(validation size) is small

Final Evaluation -cont Case Features used Best Model Train Accuracy Test Accuracy Missclassified 1 All features in SVM .9899 .9555 2 classes 2 Sepal only SVM .8472 .7111 12 3 Petal only SVM .9899 .9333 3 4 PetalWidth,Sepal ( Len,Wid ) SVM/LDA .9809 .9111 4 5 PetalLen,Sepal ( Len,Wid ) SVM .9700 .9111 4

Thank you…