Industry-wise benefits Manufacturing – uncovering of variations between purchase order Mail Order – Promotion to targeted users Supermarket – Market Basket analysis Airlines – increase sales by giving promotions or discount to frequent flyers Department store – anticipating demand of products Insurance – to detect fraud claims Banks – business by direct marketing campaign
Straight line equation y= mx+c Y= independent var X=dependent var
Data Preprocessing Data Cleaning : It the data by filling in the missing values, smoothing noisy data, resolving the inconsistency and remove the outliers Different Sources Names Location Dates Numbers Currencies Languages
Ways to handle missing data during cleaning Manual entry of missing data Using attribute mean Using most probable values by using decision tree or regression – Predicting the value Using global constant – like we can use NA or unknown Ignore the tuple/observation