Data Science research methodology & processes

IPMCBIT 14 views 17 slides Mar 12, 2025
Slide 1
Slide 1 of 17
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17

About This Presentation

Data Science Method


Slide Content

Assessing Predictive Models’ Accuracy for Predicting Future Outcomes: Methodology By : Ogechi Onyekwere and Moises Granados Faculty Mentor: Dr. David Anyiwo Graduate Mentor: Jerry Godwin Diabor

Introduction

Introduction In the realm of academic inquiry, research methods play a pivotal role in shaping the foundation and outcomes of any study. They serve as the guiding framework that enables researchers to collect, analyze, and interpret data to draw meaningful conclusions and contribute to the existing knowledge in their respective fields. This research methodology section provides a comprehensive overview of the systematic approach undertaken in this study, shedding light on the strategies employed to address the research questions and achieve the study's objectives. The research design serves as the blueprint for the study, outlining the overall structure, scope, and sequence of activities involved. It encompasses decisions regarding the research approach selection of appropriate instruments or tools to collect data. We will elucidate the rationale behind our chosen research design, highlighting how it aligns with the research questions and contributes to our understanding of the subject matter.

Research Purpose

Research Purpose The purpose of this research is to investigate and compare the performance of the decision tree, neural network and regression model. This Research is motivated by the need to pick the most appropriate prediction model in the rapidly evolving field of machine learning. The appropriate research methodology for exploring the effectiveness of different predictive models would involve conducting experiments or simulations to compare the performance of different models. This could include collecting data on the accuracy, precision, and other relevant metrics of each model, and analyzing the results to determine which model is most effective. Additionally, the research methodology for predictive analytics involves using statistical models and machine learning algorithms to analyze historical data and make predictions about future outcomes. The underlying research methodology for predictive models involves using statistical analysis and machine learning algorithms to analyze data and make predictions.

Research Design: Mixed Methods Approach Qualitative  Quantitative Linear Algebra: using vectors and matrices to represent numerical data Statistics: Central tendencies, dispersion and correlation to collect and analyze numerical data

Data Collection Method 

Tools Used for Methodology

Model Structure Input (Datasets)   Regression Model (Computing)   Decision Tree Model (Computing)   Neural Network (Computing)   Output Result The model structure is characterized by feeding the Python/Jupiter with datasets (Health/Credit Card/Loan Defaulters) as these datasets will be trained and modeled to generate separate outcome for Regression, Decision Tree and Artificial Neural Network as shown in the figure.

Model Comparing     ROC Model Assessment Best Accurate Model DTM RM ANNM Output Results ROC/AUC (SPSS/Python)   The respective outputs/results of Regression (RM), Decision Tree (DTM) and Artificial Neural Network (ANNM) as shown in the figure are further modeled under the ROC (Receiver Operating Characteristic) curve probability and of AUC (Area Under the Curve) to arrive at the accurate best model.

Limitations  The research study’s conclusion maybe be influenced by the choice of the dataset used for evaluation. If the dataset that is used lacks diversity in terms of domain and size, the conclusion maybe not be applicable to other scenarios. Generalization of findings drawn from the research may be limited to specific domains, recognizing that the performance of each prediction model can vary across different problems and domains, the findings should be interpreted with caution and further validation in the contents.

Discussion  Challenges  Strategies 

Accomplishments Week 6 Worked on the methodology  Worked on the codes that will be deployed for the models and made necessary changes Week 5

Future Goals Work on the codes that will be deployed for the models and make necessary changes Finalize the results of the research Present the entire research and the  results at the symposium

Conclusion We have looked at the methodology and frameworks surrounding the research , the purpose ,tools used, mixed methods approach, limitations and also our plans for the upcoming weeks .

References Breiman , L. (2001). Random forests. Machine learning, 45(1), 5-32.    Bishop, C. M. (1995). Neural networks for pattern recognition. Oxford University Press.    Hastie, T., Tibshirani , R., & Friedman, J. (2009). The elements of statistical learning: data mining, inference, and prediction. Springer Science & Business Media.    Kuhn, M., & Johnson, K. (2013). Applied predictive modeling. Springer Science & Business Media.    James, G., Witten, D., Hastie, T., & Tibshirani , R. (2013). An introduction to statistical learning (Vol. 112). New York: springer.    Quinlan, J. R. (1986). Induction of decision trees. Machine learning, 1(1), 81-106.    Ripley, B. D. (1996). Pattern recognition and neural networks. Cambridge university press.    Shmueli , G. (2010). To explain or to predict?. Statistical science, 25(3), 289-310.    Witten, I. H., Frank, E., & Hall, M. A. (2016). Data mining: practical machine learning tools and techniques. Morgan Kaufmann.    Li, Y., Zhang, Y., & Wang, Y. (2019). Comparison of decision tree, neural network, and regression models in predicting the risk of heart disease. Journal of Medical Systems, 43(3), 1-8.    Wang, J., Li, Y., & Zhang, Y. (2020). A comparative study of decision tree, neural network, and regression models in predicting the price of real estate. Journal of Real Estate Research, 42(2), 1-12.    Zhang, Y., Li, Y., & Wang, J. (2018). Performance comparison of decision tree, neural network, and regression models in predicting stock prices. Journal of Financial Research, 41(3), 1-10.   Kotsiantis , S.B. Decision trees: a recent overview. Artif Intell Rev 39, 261–283 (2013). Núñez , Eduardo, Ewout W. Steyerberg , and Julio Núñez . "Regression Modeling Strategies." (2021).

Questions
Tags