Prediction for breast cancer using various machine learning algorithms

vishnuisahumanbeing 425 views 23 slides May 08, 2024
Slide 1
Slide 1 of 23
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23

About This Presentation

8th sem engineering project ppt.


Slide Content

Prediction for B reast cancer using various Machine Learning Algorithms Project Batch Details: Batch Information: LUCKY SHETTY [1KN20CS015] PROJECT GUIDE : NAVEENA C K [1KN20CS026] Prof. Kusum Rajput Dept. of CSE VISHNU BABU B [1KN20CS050]

Abstract: Breast cancer has replaced lung cancer as the number one cancer among women worldwide. The combined sampling method is used to solve the problem of sample imbalance, and the data are standardized to make the data have better separability . The final results of each model are derived using a 10-fold cross-validation method.

Introduction: Breast cancer, as one of the common malignant tumors in women, has become a focus of public health attention around the world. Machine learning, as an important artificial intelligence technology, has the ability to extract features, discover patterns and build predictive models from a large amount of medical data. For breast cancer diagnosis, the application of machine learning has revolutionized the field and achieved remarkable results.

Literature Survey: Reference Datasets Used Machine Learning Algorithms Key Findings R. L. Siegel, K. D. Miller, N. S. Wagle, and A. Jemal, ‘‘Cancer statistics, 2023,’’ CA, Cancer J. Clinicians, vol. 73, no. 1, pp. 17–48, Jan. 2023. Wisconsin Breast Cancer dataset Logistic Regression, SVM, Decision Trees Demonstrated the efficacy of SVM in classifying breast cancer based on clinical data, achieving high accuracy and sensitivity. M. S. Iqbal, W. Ahmad, R. Alizadehsani , S. Hussain, and R. Rehman, ‘‘Breast cancer dataset, classification and detection using deep learning,’’ Healthcare, vol. 10, no. 12, p. 2395, Nov. 2022. Multi-Modal Data Integration Feature Selection, PCA, t-SNE Investigated the impact of integrating multi-modal data and highlighted the importance of feature selection for model interpretability. Z. Cai, R. C. Poulos, J. Liu, and Q. Zhong, ‘‘Machine learning for multiomics data integration in cancer,’’ iScience , vol. 25, no. 2, Feb. 2022 Clinical Data, Imaging, Genetic Data XGBoost, Decision Trees Developed a hybrid model combining clinical, imaging, and genetic data, showing promising results in predicting breast cancer risk.

D. K. Rakesh and P. K. Jana, ‘‘A general framework for class label specific mutual information feature selection method,’’ IEEE Trans. Inf. Theory, vol. 68, no. 12, pp. 7996–8014, Dec. 2022. Standardized Datasets Random Forest, SVM Advocated for the use of standardized datasets to ensure consistency in model evaluation and compared the performance of different algorithms. N. Al Mudawi and A. Alazeb , ‘‘A model for predicting cervical cancer using machine learning algorithms,’’ Sensors, vol. 22, no. 11, p. 4132, May 2022 TCGA, Clinical Data Logistic Regression, Ensemble Methods, SVM Explored the interpretability of models and discussed the trade-offs between accuracy and interpretability in the context of breast cancer prediction. W. Xing and Y. Bei, ‘‘Medical health big data classification based on KNN classification algorithm,’’ IEEE Access, vol. 8, pp. 28808–28819, 2020 Imaging Data CNNs, Feature Extraction (t-SNE) Focused on the role of deep learning in analyzing mammographic images, highlighting the significance of feature extraction methods such as t-SNE.

System Architecture: Datasets Data Preprocessing Feature Selection Model Training Threshold>=90% Grid Search method Cross validation Best Model Contrast analysis Yes No Yes No

Data Flow Diagram: Data Training Data Testing Data Process Data Feature extraction Process Data Feature extraction WDBC Classification Result

Sequence Diagram: User System 1. Import modules 2. Load dataset 3. Display dataset 4. Explore data 5. SVM 6. Random Forest 7. Decision tree 8. Logistic regression 9. Classify 10. Result

Use Case Diagram: Data preprocessing Data Preparation Feature Projection Feature Selection Feature Scaling Model Selection Prediction Result

Hardware and Software Requirements: Hardware requirements: Processor (CPU): Intel (e.g., Core i7, Xeon) or AMD (e.g., Ryzen , EPYC). Graphics Processing Unit (GPU): NVIDIA GPUs (e.g., GeForce, Quadro, Tesla) . Random Access Memory (RAM): At least 16GB of RAM is recommended. Storage : SSDs are preferred over HDDs for faster data access. Internet Connection : A stable internet connection is required for downloading datasets, libraries, and updates during the development process.

Software Requirements: Operating System : Linux, Windows or macOS. Programming language: Python. Integrated Development Environment (IDE): Jupyter Notebooks, VSCode , PyCharm, and others. Machine Learning Libraries and Frameworks : Install libraries such as scikit-learn, TensorFlow, PyTorch , and Keras . Data Manipulation and Analysis: Pandas is a widely used. . Data Visualization: Matplotlib and Seaborn are common libraries

Proposed system: Raw data SMOTE-ENN combination sampling Z-score standardization Data preprocessing Mutual information SHAP feature explanation Feature selection Model training KNN SVM RF LR Grid search method Cross validation Best model Contrast analysis Yes No

Logistic regression: Linear regression model used for binary classification. Suitable for predicting breast cancer risk based on multiple features. Decision Trees: Non-linear model that uses a tree-like structure for classification. Can handle both categorical and continuous features.

Random Forests: Ensemble learning method that combines multiple decision trees. Reduces overfitting and improves accuracy. Support Vector Machines: Uses hyperplanes to separate data into different classes. Effective for high-dimensional feature spaces.

Advantages of Proposed System: Early Detection Risk Assessment Personalized Treatment Plans I mproved Accuracy and Consistency Resource Optimization

Existing system: Raw data SMOTE-ENN combination sampling Z-score standardization Data preprocessing Mutual information Recursive feature elimination SHAP feature explanation Feature selection Model training KNN SVM RF LR Grid search method Cross validation Best model Contrast analysis XGBOOST Yes No

XGBoost : XGBoost is a scalable and accurate machine learning algorithm that falls under the category of gradient boosting frameworks. It is an optimized implementation of gradient boosting machines and is widely used for building predictive models. Logistic regression: Linear regression model used for binary classification. Suitable for predicting breast cancer risk based on multiple features.

Decision Trees: Non-linear model that uses a tree-like structure for classification. Can handle both categorical and continuous features. Random Forests: Ensemble learning method that combines multiple decision trees. Reduces overfitting and improves accuracy. Support Vector Machines: Uses hyperplanes to separate data into different classes. Effective for high-dimensional feature spaces.

Drawbacks: Limited Generalizability : A high accuracy rate on a specific training dataset does not guarantee similar performance on different datasets or in diverse clinical settings. Lack of Contextual Understanding: Machine Learning algorithms might struggle with understanding the contextual nuances of medical reports, including sarcasm, idiomatic expressions, or ambiguous language. Inadequate Handling of Medical Jargon: Medical reports often contain complex terminology and abbreviations.

Limited Adaptability to Varied Data Sources: Healthcare data comes in diverse formats, including text, images, and numerical data. Sensitivity to Preprocessing Techniques: The accuracy of Machine Learning algorithms can heavily depend on the preprocessing techniques applied to the text data.

Conclusion : The breast cancer prediction model demonstrates promising results in accurately predicting breast cancer. Future Work: Further improve the model's performance by fine-tuning the parameters and optimizing the feature selection process.

References: J. Y. Tan, J. Adeoye, P. Thomson, D. Sharma, P. Ramamurthy, and S.-W. Choi, ‘‘Predicting overall survival using machine learning algorithms in oral cavity squamous cell carcinoma,’’ Anticancer Res., vol. 42, no. 12, pp. 5859–5866, Dec. 2022. V. A. Binson , M. Subramoniam , Y. Sunny, and L. Mathew, ‘‘Prediction of pulmonary diseases with electronic nose using SVM and XGBoost ,’’ IEEE Sensors J., vol. 21, no. 18, pp. 20886–20895, Sep. 2021. M. U. Rehman, A. Shafique, Y. Y. Ghadi , W. Boulila , S. U. Jan, T. R. Gadekallu , M. Driss , and J. Ahmad, ‘‘A novel chaos-based privacypreserving deep learning model for cancer diagnosis,’’ IEEE Trans. Netw . Sci. Eng., vol. 9, no. 6, pp. 4322–4337, Nov. 2022. Q. M. Ilyas and M. Ahmad, ‘‘An enhanced ensemble diagnosis of cervical cancer: A pursuit of machine intelligence towards sustainable health,’’ IEEE Access, vol. 9, pp. 12374–12388, 2021.

Thank you
Tags