WEKA Tutorial and Introduction Data mining

coolscools1231 37 views 87 slides Aug 23, 2024
Slide 1
Slide 1 of 87
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79
Slide 80
80
Slide 81
81
Slide 82
82
Slide 83
83
Slide 84
84
Slide 85
85
Slide 86
86
Slide 87
87

About This Presentation

Weka Introduction


Slide Content

An Introduction to WEKA

Content What is WEKA? The Explorer Application Preprocess Classify Cluster Associate Select Attributes Visualize Weka on Trestles References and Resources 3 8/22/2024

What is WEKA? Weka is a bird found only in New Zealand. Waikato Environment for Knowledge Analysis Weka is a data mining/machine learning tool developed by Department of Computer Science, University of Waikato, New Zealand. Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes. Weka is open source software in JAVA issued under the GNU General Public License 4 8/22/2024

Download and Install WEKA Website: http://www.cs.waikato.ac.nz/~ml/weka/index.html Support multiple platforms (written in java): Windows, Mac OS X and Linux Datasets( iris.arff , weather.arff ) Available on Trestles at: /home/ diag /opt/ weka /data Available with Download: …../ weka /data/ 5 8/22/2024

Main Features 49 data preprocessing tools 76 classification/regression algorithms 8 clustering algorithms 3 algorithms for finding association rules 15 attribute/subset evaluators + 10 search algorithms for feature selection 6 8/22/2024

Main GUI Three graphical user interfaces “The Explorer” (exploratory data analysis) pre-process data build “classifiers” cluster data find associations attribute selection data visualization “The Experimenter” (experimental environment) used to compare performance of different learning schemes “The KnowledgeFlow” (new process model inspired interface) Java-Beans-based interface for setting up and running machine learning experiments. Command line Interface (“Simple CLI”) 7 8/22/2024 More at: http://www.cs.waikato.ac.nz/ml/weka/index_documentation.html

Content What is WEKA? The Explorer: Preprocess Classify Cluster Associate Select Attributes Visualize Weka on Trestles References and Resources 8 8/22/2024

9 University of Waikato 8/22/2024

WEKA:: Explorer: Preprocess Data format Uses flat text files to describe the data Data can be imported from a file in various formats: ARFF, CSV, C4.5, binary Data can also be read from a URL or from an SQL database (using JDBC)

WEKA:: ARFF file format @relation heart-disease-simplified @attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric @attribute exercise_induced_angina { no, yes} @attribute class { present, not_present} @data 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present ... numeric attribute nominal attribute A more thorough description is available here http://www.cs.waikato.ac.nz/~ml/weka/arff.html

12 University of Waikato 8/22/2024

13 University of Waikato 8/22/2024

14 University of Waikato 8/22/2024

15 University of Waikato 8/22/2024

16 University of Waikato 8/22/2024

17 University of Waikato 8/22/2024

18 University of Waikato 8/22/2024

WEKA:: Explorer: Preprocess Used to define filters to transform Data. WEKA contains filters for: Discretization, normalization, resampling, attribute selection, transforming, combining attributes, etc

20 University of Waikato 8/22/2024

21 University of Waikato 8/22/2024

22 University of Waikato 8/22/2024

23 University of Waikato 8/22/2024

24 University of Waikato 8/22/2024

25 University of Waikato 8/22/2024

26 University of Waikato 8/22/2024

27 University of Waikato 8/22/2024

28 University of Waikato 8/22/2024

29 University of Waikato 8/22/2024

30 University of Waikato 8/22/2024

31 University of Waikato 8/22/2024

32 University of Waikato 8/22/2024

WEKA:: Explorer: building “classifiers” Classifiers in WEKA are models for predicting nominal or numeric quantities Implemented learning schemes include: Decision trees and lists, instance-based classifiers, support vector machines, multi-layer perceptrons, logistic regression, Bayes’ nets, … “Meta”-classifiers include: Bagging, boosting, stacking, error-correcting output codes, locally weighted learning, …

34 August 22, 2024 This follows an example of Quinlan’s ID3 Decision Tree Induction: Training Dataset

35 August 22, 2024 age? overcast student? credit rating? <=30 >40 no yes yes yes 31..40 no fair excellent yes no Output: A Decision Tree for “buys_computer”

Basic algorithm (a greedy algorithm) Tree is constructed in a top-down recursive divide-and-conquer manner At start, all the training examples are at the root Attributes are categorical (if continuous-valued, they are discretized in advance) Examples are partitioned recursively based on selected attributes Test attributes are selected on the basis of a heuristic or statistical measure (e.g., information gain ) 36 August 22, 2024 Algorithm for Decision Tree Induction

37 University of Waikato 8/22/2024

38 University of Waikato 8/22/2024

39 University of Waikato 8/22/2024

40 University of Waikato 8/22/2024

41 University of Waikato 8/22/2024

42 University of Waikato 8/22/2024

43 University of Waikato 8/22/2024

44 University of Waikato 8/22/2024

45 University of Waikato 8/22/2024

46 University of Waikato 8/22/2024

47 University of Waikato 8/22/2024

48 University of Waikato 8/22/2024

49 University of Waikato 8/22/2024

50 University of Waikato 8/22/2024

51 University of Waikato 8/22/2024

52 University of Waikato 8/22/2024

53 University of Waikato 8/22/2024

54 University of Waikato 8/22/2024

55 University of Waikato 8/22/2024

56 University of Waikato 8/22/2024

57 University of Waikato 8/22/2024

Explorer: Select Attributes Panel that can be used to investigate which (subsets of) attributes are the most predictive ones Attribute selection methods contain two parts: A search method: best-first, forward selection, random, exhaustive, genetic algorithm, ranking An evaluation method: correlation-based, wrapper, information gain, chi-squared, … Very flexible: WEKA allows (almost) arbitrary combinations of these two 58 8/22/2024

59 University of Waikato 8/22/2024

60 University of Waikato 8/22/2024

61 University of Waikato 8/22/2024

62 University of Waikato 8/22/2024

63 University of Waikato 8/22/2024

64 University of Waikato 8/22/2024

65 University of Waikato 8/22/2024

66 University of Waikato 8/22/2024

Explorer: Visualize Visualization very useful in practice: e.g. helps to determine difficulty of the learning problem WEKA can visualize single attributes (1-d) and pairs of attributes (2-d) To do: rotating 3-d visualizations (Xgobi-style) Color-coded class values “Jitter” option to deal with nominal attributes (and to detect “hidden” data points) “Zoom-in” function 67 8/22/2024

68 University of Waikato 8/22/2024

69 University of Waikato 8/22/2024

70 University of Waikato 8/22/2024

71 University of Waikato 8/22/2024

72 University of Waikato 8/22/2024

73 University of Waikato 8/22/2024

74 University of Waikato 8/22/2024

75 University of Waikato 8/22/2024

76 University of Waikato 8/22/2024

77 University of Waikato 8/22/2024

Using Weka On Trestles

Using Weka on Trestles Shared Resources Batch and Interactive Use GUI and Command Line Use GUI on login nodes to create command line Use command line to run interactive or batch jobs on production nodes

Weka Gui To launch Weka Gui on a: Windows machine to run software on remote machine with GUI requires a secure shell with x forwarding enabled to establish a remote connection and an X Server to handle the local display. Suggested software putty and Xming http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html http://www.straightrunning.com/XmingNotes/ Linux and MAC OS X support X Forwarding Mac users need to run Applications > Utilities > Xterm ssh –Y [email protected] Load weka module Weka installation available at: /home/ diag /opt/ weka At command prompt > weka

PBS Script

Output file

Hands On with Weka

The Weather Data Set . arff file Weather.arff file Available on Trestles at: /home/ diag /opt/ weka /data On line: http://www.hakank.org/weka/ With Weka download Data Set: @relation PlayTennis @attribute day numeric @attribute outlook {Sunny, Overcast, Rain} @attribute temperature {Hot, Mild, Cool} @attribute humidity {High, Normal} @attribute wind {Weak, Strong} @attribute playTennis {Yes, No} @data 1,Sunny,Hot,High,Weak,No,? 2,Sunny,Hot,High,Strong,No,? 3,Overcast,Hot,High,Weak,Yes,? 4,Rain,Mild,High,Weak,Yes,? 5,Rain,Cool,Normal,Weak,Yes,? 6,Rain,Cool,Normal,Strong,No,? 7,Overcast,Cool,Normal,Strong,Yes,? 8,Sunny,Mild,High,Weak,No,? .

The Problem Each instance describes the facts of the day and the action of the observed person (played or no play). The Data Set 14 Instances 6 attributes (day, outlook, temp, humidity, wind, play tennis) Based on the given records we can assess which factors affected the person's decision about playing tennis.

The Question Use j48 decision tree learner to model for class attribute play tennis Make prediction for “play”. Make predictions for the ‘temperature’ attribute. Do you need to do any additional data preparation?

Result

References and Resources References: WEKA website: http://www.cs.waikato.ac.nz/~ml/weka/index.html WEKA Tutorial: Machine Learning with WEKA: A presentation demonstrating all graphical user interfaces (GUI) in Weka. A presentation which explains how to use Weka for exploratory data mining. WEKA Data Mining Book: Ian H. Witten and Eibe Frank, Data Mining: Practical Machine Learning Tools and Techniques (Second Edition) WEKA Wiki: http://weka.sourceforge.net/wiki/index.php/Main_Page
Tags