Machine Learning Do machines really learn? If so, how do they learn?
Types of ML 1. Supervised Learning Uses labeled inputs (meaning the input has a corresponding output label) to train models and learn outputs
Cat turtle dog
Some examples of supervised learning are Predicting the results of a game Predicting whether a tumor is malignant or benign Predicting the price of domains like real estate, stocks, etc. Classifying texts such as classifying a set of emails as spam or non-spam
Supervised Learning Classification When we are trying to predict a categorical or nominal variable Regression Whereas when we are trying to predict a real-valued variable
Note: Supervised machine learning is as good as the data used to train it. If the training data is of poor quality, the prediction will also be far from being precise.
Classification
There are number of popular machine learning algorithms which help in solving classification problems. To name a few, k-Nearest Neighbour , Naïve Bayes, Logistic Regression, Decision tree, and, SVM algorithms are adopted by many machine learning practitioners.
Some typical classification problems include: Image classification Prediction of disease Win–loss prediction of games Prediction of natural calamity like earthquake, flood, etc. Recognition of handwriting
K-Nearest Neighbours
kNN kNN – i.e. people having similar background or mindset tend to stay close to each other. In other words, neighbours in a locality have a similar background. In the same way, as a part of the kNN algorithm, the unknown and unlabelled data which comes for a prediction problem is judged on the basis of the training data set elements which are similar to the unknown element.
What is the basis of this similarity or when can we say that two data elements are similar? Ans 1. Euclidean Distance: Considering a very simple data set having two features (say f1 and f2 ), Euclidean distance between two data elements d1 and d2 can be measured by where f11 = value of feature f1 for data element d1 f12 = value of feature f1 for data element d1 f21= value of feature f2 for data element d2 f22= value of feature f2 for data element d2
2. How many similar elements should be considered for deciding the class label of each test data element? Ans: The answer lies in the value of ‘k’ which is a user-defined parameter given as an input to the algorithm.
Homework 1. What is the best value of k? What happens if its very large or very small? 2. Find strategies to arrive at a value for k. 3. kNN is called a lazy learner. State the reason behind it. 4. Find strengths and weaknesses of kNN algorithm.
Regression
2. Unsupervised Learning Uses unlabeled inputs to learn about patterns
3. Reinforcement Learning:
The ML Process: Data Input Abstraction Generalization
Input 1 Input 2 Input 3 …. Input n model Output (prediction) Feature vector