Lazy learners and other classification methods in data mining M.Rajshree M.SC(IT) Nadar saraswathi college of arts&science
Lazy learners lazy learning is a learning method in which generalization of the training data is, in theory, delayed until a query is made to the system, as opposed to in eager learning, where the system tries to generalize the training data before receiving queries . Lazy learners do less work while training data is given and more work when classification of a test tuple is given.
The classification methods discussed so far in this chapter—decision tree induction, Bayesian classification, rule-based classification, classification by backpropagation, support vector machines, and classification based on association rule mining—are all examples of eager learners A lazy learner simply stores the training data and only when it sees a test tuple starts generalization to classify the tuple based on its similarity to the stored training tuple
Building a model from a given set of training data Applying the model to a given set of testing data Eager Learners like Bayesian Classification, Rule-based classification, support vector machines, etc. will construct a classification model before receiving new tuple when a set of training tuple is given
k -Nearest-Neighbor Classifiers The k -nearest-neighbor method was first described in the early 1950s . Nearest-neighbor classifiers are based on learning by analogy, that is, by comparing a given test tuple with training tuples that are similar to it. The training tuples are described by n attributes. Each tuple represents a point in an n -dimensional space.
In this way, all of the training tuples are stored in an n -dimensional pattern space. When given an unknown tuple, a k -nearest-neighbor classifier searches the pattern space for the k training tuples that are closest to the unknown tuple distance between two points or tuples , say, X 1 = ( x 11, x 12…. x 1 n ) and X 2 = ( x 21, x 22… x 2 n ) When given a test tuple, a k-nearest neighbor classifier searches the pattern space for the k training tuples that are closest to the test tuple. These k training tuples are the k “nearest neighbors” of the test tuple
Case-Based Reasoning Base-based reasoning is the process of solving new problems based on the solutions of similar past problems. These classifiers use a database of problem solutions to solve new problems. The case-based reasoner tries to combine the solutions of the neighboring training cases in order to propose a solution for the new case
Case-based reasoning (CBR) classifiers use a database of problem solutions to solve new problems. Unlike nearest-neighbor classifiers, which store training tuples as points in Euclidean space, CBR stores the tuples or cases ‖ for problem solving as complex symbolic descriptions. Business applications of CBR include problem resolution for customer service help desks, where cases describe product-related diagnostic problems.
CBR has also been applied to areas such as engineering and law, where cases are either technical designs or legal rulings, respectively . Medical education is another area for CBR, where patient case histories and treatments are used to help diagnose and treat new patients . The case-based reasoner tries to combine the solutions of the neighboring training cases in order to propose a solution for the new case . The case-based reasoner may employ background knowledge and problem-solving strategies in order to propose a feasible combined solution.
Other classification methods Data mining involves six common classes of tasks. Anomaly detection, Association rule learning, Clustering, Classification , Regression, Summarization. Classification is a major technique in data mining and widely used in various fields . Classification is a technique where we categorize data into a given number of classes
Binary Classification: Classification task with two possible outcomes Eg : Gender classification (Male / Female) Multi class classification: Classification with more than two classes. In multi class classification each sample is assigned to one and only one target label Eg : An animal can be cat or dog but not both at the same time Multi label classification: Classification task where each sample is mapped to a set of target labels (more than one class). Eg : A news article can be about sports, a person, and location at the same time.
Naïve Bayes Naive Bayes algorithm based on Bayes’ theorem with the assumption of independence between every pair of features. Naive Bayes classifiers work well in many real-world situations such as document classification and spam filtering . This algorithm requires a small amount of training data to estimate the necessary parameters. Naive Bayes classifiers are extremely fast compared to more sophisticated methods.
Fuzzy Set Approaches Fuzzy Set Theory is also called Possibility Theory. This theory was proposed by Lotfi Zadeh in 1965 as an alternative the two-value logic and probability theory This theory allows us to work at a high level of abstraction. It also provides us the means for dealing with imprecise measurement of data . fuzzy set approach an important consideration is the treatment of data from a linguistic view point from this has developed an approach that uses linguistically quantified propositions to summarize the content of a data base by providing a general characterization of the analyzed data