Gastric Cancer detection New(Prasun new file).pptx

prasunrajeevkumar 26 views 26 slides Sep 30, 2024
Slide 1
Slide 1 of 26
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26

About This Presentation

Gastric cancer detector using ML and Ai


Slide Content

Gastric Cancer detection Netaji subhash university of technology Abhinav Kumar- 2021UEC2503 Prasun Kumar- 2021UEC2544 Anushka - 2021UEC2547 Amrit Tharani- 2021UEC255 3

2 Gastric cancer detection using machine learning : It refers to the application of machine learning algorithms and techniques to identify, classify, or predict the presence of gastric (stomach) cancer in patients based on medical data. Machine learning offers a powerful approach to improving the early detection and diagnosis of gastric cancer by analyzing patterns in various forms of clinical, demographic, and imaging data that might not be easily identifiable by traditional methods. Why Use Machine Learning for Gastric Cancer Detection? Early Diagnosis Automated Screening Precision Medicine Improve Accuracy

Mathematical model Parameter Tuning Using AVOA Feature Extraction (Net B5 Model) Classification Using (ResNet101) Object Detection N Instance Segmentation Using ( MaskRCNN ) Testing & Evaluation the Model Data Collection Data Preprocessing

Data Understanding 4 It is a critical step in any machine learning project, including gastric cancer detection. It involves exploring, analyzing, and interpreting the data to gain insights and ensure its relevance and quality for building machine learning models. In the context of gastric cancer detection, this phase helps ensure that the data collected is comprehensive, consistent, and structured in a way that can effectively train a predictive model. Data Pre-processing

Removing of missing Values Missing values in medical data are common, as patient records may be incomplete due to factors like unrecorded test results, incomplete follow-ups, or technical issues during data collection. Handling missing values ensures that the machine learning models can make better predictions based on reliable and complete data. Identify Missing Data Assess Missing Data Impact Decide on the Strategy First, identify which rows and columns contain missing data using methods like isnull () Evaluate how much data is missing in each feature. If a small percentage of the data is missing, simple techniques like removal may work, but if a significant portion is missing, more advanced imputation techniques are needed. Impute Missing Data : If removing data would result in too much loss, use techniques like mean/median imputation, KNN imputation, or more advanced methods. K-Nearest Neighbors (KNN) Imputation: KNN imputation works by finding the closest data points (neighbors) to a record with missing values and filling in the missing values based on the neighbors’ values. This method is useful for gastric cancer detection when there is a relationship between features. For example, tumor size or CEA levels might be related to patient age, so missing values can be inferred from similar patients.

Standard Scaler 6 In the context of gastric cancer detection using image data , such as medical imaging (CT, MRI, or endoscopic images) , preprocessing becomes slightly more complex due to the nature of image data. Why Scaling is Important in Image-Based Gastric Cancer Detection? In machine learning models, especially deep learning models like Convolutional Neural Networks (CNNs), it is essential that the data is standardized for the following reasons: Improved Convergence : Neural networks are sensitive to the scale of the input data. Standardizing pixel values can improve the speed and stability of convergence during training. Improved Convergence : Neural networks are sensitive to the scale of the input data. Standardizing pixel values can improve the speed and stability of convergence during training. Handling Variability : Medical images might have different brightness, contrast, and resolution. Standardizing the pixel values helps the model focus on important features (like tumors or lesions) rather than irrelevant variations in the images.

Steps for Using StandardScaler in Gastric Cancer Detection with Image Data 7 Load the Image Data : First, the image data is loaded into a matrix or tensor format, where each pixel value is represented in terms of its intensity. Reshape the Data : Since images are usually in 2D or 3D arrays, we first need to reshape the images into a 1D array or flatten them into vectors before applying StandardScaler . For example, a grayscale image of size 256x256 will have 65,536 pixel values, which can be reshaped into a single array. Apply StandardScaler : Once the images are reshaped into a suitable format, the StandardScaler is applied to scale the pixel values. It computes the mean and standard deviation for the training set and uses them to transform both the training and testing sets. Reshape the Data Back : After scaling, the data is reshaped back into the original image format (e.g., from a 1D array back to a 256x256 matrix).

FEATURE EXTRACTION(Using NetB5 MODEL) 8 In machine learning, feature extraction is a critical step in transforming raw data into a format that is more suitable for the model. In the context of the gastric cancer detection project, feature extraction involves identifying and isolating relevant patterns from the endoscopic images that can help the machine learning model classify them as cancerous or non-cancerous . This section provides a detailed breakdown of feature extraction using the NETB5 model and its role in our project.

1. Role of Feature Extraction Feature extraction is necessary because raw endoscopic images contain an overwhelming amount of information, most of which may not be useful for the task of cancer detection. Instead of using the raw pixel values directly, we want to extract high-level features , such as textures, edges, shapes, and color gradients, which are often indicative of abnormalities like cancerous lesions. 2. Why NETB5 for Feature Extraction? In this project, the NETB5 model is employed for feature extraction. EfficientNet-B5 (NETB5) is a pre-trained deep learning model from the EfficientNet family, which is known for its balance between accuracy and efficiency. The key reasons for choosing NETB5 include: Pretrained Knowledge : NETB5 has been pretrained on a large dataset (e.g., ImageNet), so it has already learned how to identify complex features like edges, textures, and shapes, which can be transferred to the task of identifying cancerous tissue. Efficient Architecture : NETB5 uses a compound scaling method to balance network depth, width, and resolution, making it highly effective in extracting informative features with fewer parameters and lower computational cost compared to deeper models. Feature Transfer : By leveraging transfer learning, we can fine-tune the NETB5 model to adapt its knowledge from general image recognition to the specific task of detecting gastric cancer from endoscopic images. 9

3. Step-by-Step Process of Feature Extraction Using NETB5 3.1 Preprocessing for Feature Extraction Before feature extraction can begin, the images need to be preprocessed. The typical preprocessing steps include: Resizing : Each image is resized to a fixed dimension (e.g., 224x224 pixels) that NETB5 expects as input. Normalization : Pixel values are normalized to a range between 0 and 1 to make training more stable and improve the model’s convergence. Augmentation : Techniques like rotation, zoom, and flipping are applied to artificially expand the dataset, helping NETB5 generalize better. 3.2 Feature Extraction Workflow Once preprocessing is complete, the feature extraction process follows these steps: Input Image Feeding : The preprocessed endoscopic images are fed into the NETB5 model, starting from the input layer. Layer-wise Transformation : As the image passes through each layer of NETB5 (convolutional layers, pooling layers, and activation functions), the model extracts hierarchical features: Lower Layers : Capture simple features like edges and corners. Middle Layers : Detect more complex structures like textures, patterns, and basic shapes. Higher Layers : Identify advanced features such as tumor shapes, irregular growth patterns, or distinctive color variations indicative of gastric cancer.

3.3 Feature Map Generation : After passing through the final convolutional layers, NETB5 outputs a feature map , a multi-dimensional representation of the image that highlights important regions related to cancerous tissue. These feature maps are essentially compressed representations of the original image, with the most relevant information retained for classification. 3.4 Feature Transfer via Pretrained Weights Since NETB5 has been pretrained on ImageNet, it can transfer the knowledge it has gained to this specific problem. During training, the model's weights are fine-tuned based on the gastric cancer dataset, enabling NETB5 to better focus on cancer-specific features. However, the pretrained layers still provide a strong starting point by recognizing general patterns in medical images. 3.5 Feature Vector Construction Once the feature map has been generated, it is typically flattened into a feature vector , which is a one-dimensional array containing the extracted features. This feature vector serves as the input to a classifier (e.g., ResNet101 ) or other downstream models. By reducing the raw image data into a manageable feature vector, we retain the most informative elements of the image while discarding irrelevant details. 11

5. Advantages of Using NETB5 for Feature Extraction Computational Efficiency : NETB5 provides a balance between accuracy and computation cost by scaling network width, depth, and resolution efficiently. Transfer Learning : The use of pretrained weights reduces the need for a large dataset specific to gastric cancer, speeding up training and improving accuracy even with limited medical data. Feature Hierarchy : NETB5 captures a multi-scale feature hierarchy, which is crucial for medical image analysis as cancerous tissues may vary significantly in size, shape, and texture. 6. Using Extracted Features for Classification After feature extraction, the feature vector generated by NETB5 can be passed to a classifier, such as ResNet101 . ResNet101 will then take this vector as input and perform the final classification step to determine whether the image represents a cancerous or non-cancerous tissue. Additionally, Mask R-CNN can be employed for object detection and segmentation , where the features extracted by NETB5 can be used to detect the exact boundaries of the cancerous regions in the endoscopic images.

Conclusion Feature extraction using NETB5 is a powerful approach to capturing the essential elements of endoscopic images. It enables machine learning models to focus on high-level, cancer-specific features that improve classification accuracy. This process, combined with further classification by ResNet101 and Mask R-CNN for segmentation, provides a robust framework for early detection of gastric cancer. By leveraging the strengths of NETB5 , you can ensure that the model is capable of efficiently extracting relevant information from images, leading to better performance in cancer detection tasks.

RESNET In our project, detecting cancerous tissues with high accuracy requires powerful deep learning models that can handle the complexity of medical images. One of the models we've leveraged is ResNet , which helps us analyze images by extracting deep features from them, making it an integral part of our detection system. 14

RESNET 15 ResNet , short for Residual Networks , is a type of deep Convolutional Neural Network (CNN) that was introduced to solve the vanishing gradient problem , which occurs when training very deep networks. ResNet is available in various depths, such as ResNet-18 , ResNet-34 , ResNet-50 , ResNet-101 , and ResNet-152 , where the number refers to the number of layers in the network.

SKIP CONNECTION 16 In a regular neural network , the output of a layer is passed directly to the next layer. But in ResNet , the output of a layer is added to the original input of that layer (bypassing the layer in between). This means the network only has to learn the residual (difference) between the input and the output, which is often easier for deep models. Mathematically, this can be written as: Output=F(x)+x where: x is the input to the layer. F(x) is the output after passing through the layer (the transformation applied to x). F(x)+x is the residual output after adding the input directly to the transformed output.

BENEFITS Solve the Vanishing Gradient Problem : By allowing gradients to flow through the skip connections, deep models are able to learn more effectively without losing important information in early layers. Easier to Train Deep Networks : Models can have hundreds of layers (like ResNet-101 ) and still perform well, because the network doesn’t need to learn an entirely new representation at each layer—it only needs to learn incremental changes to the previous layer’s output. Improve Accuracy : Skip connections improve the convergence speed (how fast the model learns) and the final accuracy by making sure the model doesn’t get stuck during training 17

Why MASK RCNN Instance Segmentation The primary reason to choose Mask R-CNN over simpler models is its ability to perform instance segmentation . In cancer detection, we need to identify not only whether cancer is present but also where exactly it is located, and Mask R-CNN provides this through pixel-wise segmentation masks. 18

COMPARISON 19 CNN (Convolutional Neural Network) No Object Localization : CNNs are typically used for classification, meaning they can predict whether a whole image contains cancer, but they cannot localize or identify where exactly the cancerous tissue is within the image. RCNN (Region-based Convolutional Neural Network) No Segmentation : RCNN can provide bounding boxes but does not perform segmentation (i.e., it cannot provide exact pixel-level boundaries of cancerous tissue).

COMPARSION 20 Faster R-CNN is a significant improvement over RCNN. It introduces a Region Proposal Network (RPN) , which generates region proposals more efficiently and in parallel to the CNN feature extraction process. This results in much faster detection compared to RCNN. No Segmentation : Faster R-CNN only performs object detection (i.e., providing bounding boxes), not segmentation . While bounding boxes give rough localization of cancerous areas, they lack the pixel-wise precision needed in medical diagnostics.

MASK RCNN Mask R-CNN (Mask Region-Based Convolutional Neural Network) is an advanced deep learning model used for object detection and instance segmentation . While traditional object detection models only identify bounding boxes around objects, Mask R-CNN goes further by predicting the precise pixel-by-pixel mask (or segmentation) for each detected object. 21

STEPS INVOLVED Input : We start by feeding medical images (e.g., endoscopy images, CT scans) into the Mask R-CNN model. Feature Extraction : The backbone network (e.g., ResNet ) extracts important features from the medical images, identifying patterns that could signify cancerous tissue. Region Proposal : The Region Proposal Network suggests areas (regions) where cancer is likely present. These proposals are bounding boxes around possible tumor regions. RoI Align : The proposed regions are aligned and refined for better accuracy. Classification : The refined regions are classified as either containing cancer or not. Segmentation Mask : For regions classified as cancerous, the Mask R-CNN generates pixel-wise segmentation masks. These masks accurately highlight the exact boundaries of the tumor or cancerous tissue 22

TESTING AND Evaluation 23 After training the model, we test it using unseen data (the test set) to evaluate how well it generalizes to new examples. This step involves feeding medical images the model hasn’t seen before to assess its performance. Accuracy : Proportion of correct predictions Precision : How many of the positive predictions are actually correct (important for avoiding false positives). Recall (Sensitivity) : How many actual positive cases were correctly identified. IoU (Intersection over Union) : Particularly useful for MaskRCNN , it measures the overlap between the predicted segmentation mask and the ground truth.

REFERENCES He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep Residual Learning for Image Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , 770-778. This can be cited in your report when discussing the use of transfer learning in CNNs, especially if you utilized pre-trained models like ResNet for fine-tuning on medical images. LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep Learning. Nature , 521(7553), 436-444. doi:10.1038/nature14539. This reference can be used for general background on Convolutional Neural Networks (CNNs) , explaining how deep learning models, particularly CNNs, are effective in image classification tasks like medical imaging. Chen, T., & Guestrin , C. (2016). XGBoost : A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , 785-794. This source covers the XGBoost algorithm, which you used for structured clinical data analysis in your project. It explains how gradient boosting improves prediction accuracy and handles imbalanced datasets. 24

REFERENCES Breiman , L. (2001). Random Forests. Machine Learning , 45(1), 5-32. doi:10.1023/A:1010933404324. Reference this paper for the theoretical background on Random Forests , including feature importance and ensemble learning methods, particularly useful for structured clinical data. El- Serag , H. B. (2002). The Epidemic of Gastroesophageal Reflux Disease: A Review. Gastroenterology Clinics of North America , 31(4), 821-847. This reference can be used for background information on gastric cancer epidemiology , supporting the importance of early detection and the relevance of machine learning applications in this field. Wang, H., & Xiao, Y. (2019). Machine Learning for Early Gastric Cancer Diagnosis: Systematic Review. Journal of Medical Internet Research , 21(6), e15793. doi:10.2196/15793. This is a good reference for reviewing the application of machine learning in gastric cancer detection, summarizing existing methods and their effectiveness. 25

Thank you
Tags