Lect8 Classification & prediction

hktripathy 692 views 13 slides Feb 28, 2019

Slide 1 of 13

About This Presentation

classification & prediction

Size: 413.54 KB

Language: en

Added: Feb 28, 2019

Slides: 13 pages

Slide Content

Data Mining: Classification

Classification and Prediction What is classification? What is prediction? Issues regarding classification and prediction Classification by decision tree induction

Classification: predicts categorical class labels classifies data (constructs a model) based on the training set and the values (class labels) in a classifying attribute and uses it in classifying new data Prediction: models continuous-valued functions, i.e., predicts unknown or missing values Typical Applications credit approval target marketing medical diagnosis treatment effectiveness analysis Classification vs. Prediction

Classification Learning: Definition Given a collection of records ( training set ) Each record contains a set of attributes , one of the attributes is the class Find a model for the class attribute as a function of the values of the other attributes Goal: previously unseen records should be assigned a class as accurately as possible Use test set to estimate the accuracy of the model Often, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it.

Illustrating Classification Learning

Examples of Classification Task Predicting tumor cells as benign or malignant Classifying credit card transactions as legitimate or fraudulent Classifying secondary structures of protein as alpha-helix, beta-sheet, or random coil Categorizing news stories as finance, weather, entertainment, sports, etc.

Classification - A Two-Step Process Model construction: describing a set of predetermined classes Building the Classifier or Model Each tuple /sample is assumed to belong to a predefined class, as determined by the class label attribute The set of tuples used for model construction: training set The model is represented as classification rules, decision trees, or mathematical formulae Model usage: for classifying future or unknown objects Using Classifier for Classification Estimate accuracy of the model The known label of test sample is compared with the classified result from the model Accuracy rate is the percentage of test set samples that are correctly classified by the model Test set is independent of training set, otherwise over-fitting will occur

Classification Process (1): Model Construction Example: Loan application The data classification process: Learning: Training data are analyzed by a classification algorithm. Here, the class label attribute is loan decision , and the learned model or classifier is represented in the form of classification rules.

Classification Process (2): Use the Model in Prediction Classification: Test data are used to estimate the accuracy of the classification rules. If the accuracy is considered acceptable, the rules can be applied to the classification of new data tuples.

Supervised vs. Unsupervised Learning Supervised learning (classification) Supervision: The training data (observations, measurements, etc.) are accompanied by labels indicating the class of the observations New data is classified based on the training set Unsupervised learning (clustering) The class labels of training data is unknown Given a set of measurements, observations, etc. with the aim of establishing the existence of classes or clusters in the data

Issues (1): Data Preparation Data cleaning Preprocess data in order to reduce noise and handle missing values Relevance analysis (feature selection) Remove the irrelevant or redundant attributes Data transformation Generalize and/or normalize data

Issues (2): Evaluating Classification Methods Predictive accuracy Speed and scalability time to construct the model time to use the model Robustness handling noise and missing values Scalability efficiency in disk-resident databases Interpretability: understanding and insight provded by the model Goodness of rules decision tree size compactness of classification rules

The problem Given a set of training cases/objects and their attribute values, try to determine the target attribute value of new examples. Classification Prediction Use a decision tree to predict categories for new events. Use training data to build the decision tree. New Events Decision Tree Category Training Events and Categories

Lect8 Classification & prediction

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Lect8 Classification &amp; prediction

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Pray For The Peace Of Jerusalem and You Will Prosper

Don_t_Waste_Your_Life_God.....powerpoint

VILLASUR_FACTORS_TO_CONSIDER_IN_PLATING_SALAD_10-13.pdf

Fertility awareness methods for women in the society

Chapter 5 Arithmetic Functions Computer Organisation and Architecture

syakira bhasa inggris (1) (1).pptx.......

Lect8 Classification & prediction