a final year project on ml which uses various algorithms in the field of computer science and engineering
Size: 471.11 KB
Language: en
Added: May 25, 2024
Slides: 13 pages
Slide Content
Prediction for breast cancer using Natural Language Processing Algorithms Project Batch Details: Batch Information: LUCKY SHETTY [1KN20CS015] PROJECT GUIDE : NAVEENA C K [1KN20CS026] Prof. Kusum Rajput Dept. of CSE VISHNU BABU B [1KN20CS050]
Abstract: Breast cancer has replaced lung cancer as the number one cancer among women worldwide. The combined sampling method SMOTE-ENN is used to solve the problem of sample imbalance, and the data are standardized to make the data have better separability . The final results of each model are derived using a 10-fold cross-validation method.
Introduction: Breast cancer, as one of the common malignant tumors in women, has become a focus of public health attention around the world. Machine learning, as an important artificial intelligence technology, has the ability to extract features, discover patterns and build predictive models from a large amount of medical data. For breast cancer diagnosis, the application of machine learning has revolutionized the field and achieved remarkable results.
Literature survey: Logistic regression: Linear regression model used for binary classification. Suitable for predicting breast cancer risk based on multiple features. Decision Trees: Non-linear model that uses a tree-like structure for classification. Can handle both categorical and continuous features.
Random Forests: Ensemble learning method that combines multiple decision trees. Reduces overfitting and improves accuracy. Support Vector Machines: Uses hyperplanes to separate data into different classes. Effective for high-dimensional feature spaces.
Existing system: Hybrid strategy SMOTE-ENN XGBoost algorithm RANDOM FOREST SUPPORT VECTOR MACHINE K-NEAREST NEIGHBOR (KNN) LOGISTIC REGRESSION (LR)
Drawbacks: Limited Generalizability : A high accuracy rate on a specific training dataset does not guarantee similar performance on different datasets or in diverse clinical settings. Lack of Contextual Understanding: NLP algorithms might struggle with understanding the contextual nuances of medical reports, including sarcasm, idiomatic expressions, or ambiguous language. Inadequate Handling of Medical Jargon: Medical reports often contain complex terminology and abbreviations.
Limited Adaptability to Varied Data Sources: Healthcare data comes in diverse formats, including text, images, and numerical data. Sensitivity to Preprocessing Techniques: The accuracy of NLP algorithms can heavily depend on the preprocessing techniques applied to the text data.
Proposed system:
Hardware and software requirements: Hardware requirements: Requires a multi-core CPU, ideally with 16GB RAM. Utilizes a dedicated GPU, preferably NVIDIA GeForce RTX series or higher, for efficient deep learning model training. Benefits from high-speed storage, such as SSDs, for quick data retrieval and model loading.
Software Requirements: Utilizes Python for coding and algorithm implementation. Employs TensorFlow, PyTorch , NLTK, and spaCy for advanced natural language processing. Utilizes Matplotlib, Seaborn for data visualization, and Git/GitHub for version control and collaboration.
Conclusion : The breast cancer prediction model demonstrates promising results in accurately predicting breast cancer. Future Work: Further improve the model's performance by fine-tuning the parameters and optimizing the feature selection process.