Orange Data Mining & Data Visualization Tool

2,591 views 61 slides May 15, 2021

Slide 1 of 61

About This Presentation

A short presentation on the "Orange Data Mining & Data Visualization Tool"

Size: 3.13 MB

Language: en

Added: May 15, 2021

Slides: 61 pages

Slide Content

Dr Mithileysh Sathiyanarayanan 1

Orange Tool 2 Let’s Learn Orange Data Mining and Data Visualization Tool

What is Data Mining? 3 process of analyzing data from different perspectives summarizing it into useful information information that can be used to increase revenue, cuts costs, or both. data mining helps analysts recognize significant information, facts, relationships, trends, patterns, exceptions, anomalies that might otherwise go unnoticed.

Major Data Mining Tasks 4 1)Classification: Predicting an item class 2)Clustering: descriptive, finding groups of items 3)Deviation Detection: predictive, finding changes 4)Forecasting: predicting a parameter value 5)Description: describing a group 6)Link analysis: finding relationships and associations

Major Industries Using Data Mining 5 retail finance education healthcare agriculture manufacturing transportation aerospace

Why Orange? Open Source Component based No programming Data visualization Platform independent software Allows clustering and classification Data mining through visual programming and python scripting Introduction Orange is component based visual programing software for data mining. machine learning and data analysis Supports communication between data scientists and domain experts. You can get orange software from this link: https://orange.biolab.si/getting-started/ 6

Getting Started With ORANGE!! 7

sss

Dataset: Heart Disease ATTRIBUTES Narrowing diameter Cholesterol Chest pain Rest ECG Fasting blood sugar Max HR Age,gender and more . 7 Has 303 instances 13 attributes Categorical class with 2 values (0,1) In .csv format Source: pre loaded datasets of Orange. .

Age: heart disease increases with age greater than 65 Fatty deposits called plaques also collect along your artery walls Slow the blood flow from the heart Causing coronary heart diseases. Gender: Heart disease is leading cause of death for both men and women. Dataset: How following factors cause Heart Disease? 11

Aangina: is chest pain or discomfort caused when your heart muscle doesn't get enough oxygen-rich blood. Cholesterol: When there is too much cholesterol in your blood. it builds up in the walls of your arteries causing a process called atherosclerosis(heart disease), Diameter Narrowing: Heart disease is caused by the narrowing or blockage of the coronary arteries. Target attribute (0,1) 12

Loading data file into data table: 14

EDA: Exploratory data analysis D istributions . 15

Distributions 16

“

Algorithms: KNN Naïve Bayes' Decision Tree Selected Algorithm Neural Network Random Forest Logistic Regression 19

Experimental Setup 20 This is how we drag and drop the widgets and implements our algorithms

KNN(k nearest neighbor) KNN is non-parametric method used for classification and regression . Requires three things The set of stored records. Distance Metric to compute distance between records. The value of k , the number of nearest neighbors to retrieve Unknown record Math equation: d(p,q) = √Σ(pi – 𝒒𝒊)𝟐 21

Decision tree Used to visually and explicitly represent decisions and decision making . predictive modelling approaches used in: statistics , data mining and machine learning m E nt r o p y ( D )    p i l o g 2 ( p i ) i  1 26

Naïve Baye's Also known as Naive Bayes Classifiers. Attributes are statistically independent on one another. Unlike other classifiers for a given class There will be some correlation between features. Explicitly models the features as conditionally independent given the class. P(H|X) = P(X|H)(P H 𝑃(𝑋 ) 34

Random Forest It is a flexible and simple Random Forest algorithm avoid the over fitting problem. Used for identifying the most important features from the training dataset. It can be used for both classification and regression tasks. 39

Logistic Regression Used to assign observations to a discrete set of classes. Logistic regression can be binomial, ordinal or multinomial. Binary (Pass/Fail) Multi (Cats, Dogs, Sheep) Ordinal (Low, Medium, High) Can view probability scores underlying the model’s classifications. 44

Neural Network Neural networks is learning algorithms. Interpret sensory data Through a kind of machine perception, labeling or clustering raw input. Consist of different layers for analyzing and learning data. Math equation : f(X)=b+∑ i w i x i 48

Concluding Results 53

Table to compare data Recall Precision F-Measures Neural Network 0.813 0.814 0.814 Logistic Regression 0.848 0.848 0.848 Random forest 0.807 0.807 0.807 54

Projects: 58 Traffic Communication Data Analysis Job Scam Data Analysis Email Communication Data Analysis Social Media Data Analysis Healthcare Data Analysis

59 EMAIL COMMUNICATION DATA ANALYSIS

References: https://www.youtube.com/watch?v=pYXOF0jziGM&index=6&list=PLmNPvQr9Tf- ZSDLwOzxpvY-HrE0yv-8Fy https://www.youtube.com/watch?v=bp0VtVS3LN4&index=9&list=PLmNPvQr9Tf- ZSDLwOzxpvY-HrE0yv-8Fy https://orange.biolab.si/getting-started/ https://en.wikipedia.org/wiki/Random_forest https://en.wikipedia.org/wiki/Decision_tree_learning http:// orange.biolab.si /docs/latest/– http:// en.wikipedia.org /wiki/ Data_mining – http:// www.oracle.com / technetwork /database/options/advanced-analytics/ odm / index.html – http:// eprints.fri.uni-lj.si /1150/1/ DataMining-Kyoto.pdf 60

Orange Data Mining & Data Visualization Tool

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Orange Data Mining &amp; Data Visualization Tool

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Slide 38

Slide 39

Slide 40

Slide 41

Slide 42

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47

Slide 48

Slide 49

Slide 50

Slide 51

Slide 52

Slide 53

Slide 54

Slide 55

Slide 56

Slide 57

Slide 58

Slide 59

Slide 60

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

MGV Residential Design projects for different clients, including a New Mexico Adobe project-1-.pdf

EUNITED_Advocacy and Public Engagement through Visual Media

DESIGN THINKINGGG PPT 2 TOPIC IDEATION.pptx

DESIGN THINKING CHAPTER 1 PPTT PPT 1.pptx

Hinduism and Its History - PowerPoint Slides.pptx

Service Attributes of Manufactured Parts.pptx

Orange Data Mining & Data Visualization Tool