VISVESVARAYA TECHNOLOGICAL UNIVERSITY, BELAGAVI Rao Bahadur Mahabaleshwarappa College of Engineering Department of Computer Science & Engineering Big Big Data Mining with Machine Learning Techniques for Knowledge Discovery and Analysis Mr. Vikas G. Bhowate USN: 3VC18PCS03 PhD Dissertation Defense Supervisor Dr. T. Hanumantha Reddy Principal, RYMEC, Ballari 2/13/2025 1
Presentation Outline Introduction Research Gaps Problem Statement Research Objectives Methodology Result & Discussion Conclusion Future Scope Publications Bibliography 2/13/2025 2
2/13/2025 3 Introduction Big Data - High volumes, rapid data flow, and a wide diversity of data types. Big Data Needs high performance processing. Biggest Challenge in Big Data is I mbalanced Classification Traditional classifiers fail to address in class distributions, resulting in biased and inaccurate outcomes. Unbalanced data classification continues difficult issue in Data mining and Machine Learning, particularly in multimedia data, despite substantial research efforts. Analyzing human attributes is a difficult duty in the world of computer visualization. Major challenges are brought on by data that is primarily dispersed unevenly. Developing an effective model through conventional data mining and machine learning methods becomes considerably challenging without incorporating data preprocessing techniques for balancing the dataset. To resolve the above issues in this research there are various techniques involved to balance the imbalanced data available to the Big Data under classification.
2/13/2025 4 Problem Statement Big Data can not be efficiently managed, processed, or analyzed using conventional data processing techniques. Big Data with multiple classes skewed, produces biased classification results. Numerous studies on Big Data that attempted to validate data and compile data from various sources while taking into account quality, storage, and the shortage of data science professionals failed. As a result, following problems can be encountered while classifying unbalanced data. Classifying large, unbalanced data, which has a skewed class distribution, is difficult since it results in subpar learning performance. Imbalanced data remains a persistent challenge in anomaly detection tasks, and multi-class imbalance presents significant complexities that extend beyond two-class skewed problems. Unravelling insights from extensive datasets remains a formidable obstacle for conventional ML methods. To provide solutions to the aforementioned problem, Research on Big Data Mining with Machine Learning Techniques for Knowledge Discovery and Analysis
2/13/2025 5 Research Gaps Data skewness, where positive data samples, which are often the class of interest, are greatly out numbered by the negative ones, is a frequent quandary in real-world applications. Due to the current exponential expansion of data, the standard algorithm is unable to handle the problems posed by large amounts of data. The difficulty with classification is that balance and imbalance data must be taken into consideration. For the analysis of social and economic data, linking records from the same user (or entity) across several data sources is a significant challenge. ML and data analysis face unique challenges with Big Data Analytics, including diverse raw data formats, fast-moving streaming data, data analysis validity, scattered input sources, noisy and poor-quality data, high dimensionality, algorithm scalability, imbalanced input data, limited supervised/labelled data. Big and unbalanced data categorization, which has a skewed class distribution, is extremely difficult since it results in subpar learning.
2/13/2025 6 Research Objectives Design of classifiers for classification from raw data and also to learn more accurate patterns in Big Data Analytics. Machine learning techniques to analyse imbalanced learning problems in the presence of underrepresented data. Knowledge graph-based techniques for learning in the presence of redundant patterns in Big Data Mining .
2/13/2025 7 Contributions of the work Design of classifiers for classification from raw data and also to learn more accurate patterns in Big Data Analytics. I ncremental learning-based ensemble classifier SSPCS Machine learning techniques to analyse imbalanced learning problems in the presence of underrepresented data. Lupusbug optimization-based deep CNN classifier Knowledge graph-based techniques for learning in the presence of redundant patterns in Big Data Mining. PCF-based deep ensemble classifiers algorithm
2/13/2025 8 Classifiers ANN, K-NN, SVM, DT, and NB are used and the output from each classifier is fused together to produce the ensembled output and then the majority and minority class are determined. Where the foraging spider’s position in (s+1) th iteration, lb is the lower bounds of the search space in P th dimension and h is the random floating-point ranging from 0 to 1. I ncremental learning-based ensemble classifier SSPCS Figure: The ensemble classifier for classification
2/13/2025 9 where tracing cat position with s th iteration, is the tracing cat position for iteration (s+1) th , is the velocity for tracing cat, is the position processing for best fitness value, J is constant, that ranges from -1 to 1, is cat velocity at iteration and is a random value which varies between 0 to 1
2/13/2025 10 Liver disorder dataset : R ecords of 416 liver-affected patients and non- live patients of 167, 441- male, and 142-female, ( Sample of North East of Andhra Pradesh, India. ) Specificity analysis through varying optimization population for liver disorder dataset Sensitivity analysis through varying optimization population for liver disorder dataset