Presentation in a webinar hosted by Petroleum Engineers Association (PEA) in 28 July 2023. The topic of the webinar is computer vision for petroleum geoscience.
Size: 70.93 MB
Language: en
Added: Jul 28, 2024
Slides: 47 pages
Slide Content
Computer Vision and GenAI in Geoscience YOHANES NUWARA PETROLEUM ENGINEERS ASSOCIATION (PEA) Trondheim, 28.07.2024
Yohanes Nuwara Career: Data scientist at Prores AS, Norway (2024-) Computer vision for porosity and permeability prediction from core images Lead data analyst at APP Sinarmas, Indonesia (2022-2023) Sustainability dashboard for management Expert data scientist at APP Sinarmas, Indonesia (2022-2023) LiDAR, computer vision for remote sensing UAV Research engineer at OYO Corporation, Japan (2020-2022) Distributed Acoustic Sensing (DAS) for earthquake seismology Education: Politecnico di Milano, Italy (2023-) Master’s in Business Analytics and Big Data Bandung Institute of Technology, Indonesia (2015-2019) Bachelor’s in Geophysical Engineering
Outline What is computer vision? Computer vision methods and models Use case 1: Automatic rock typing using segmentation model Use case 2: Boulder detection for seabed mapping Challenges in computer vision What is Generative AI? Generative vision models Conclusion
Computer vision
What is computer vision? Computer vision is a scientific field which task is to understand image based on its pixel information using traditional image analysis and artificial intelligence 1 2 3 4
Why computer vision is so growing??? ‹#› Rapid growth of computers and hardware chips Bigger, modern, and more secure data storage Rapid evolution of AI computer vision models More and more advanced optics and camera technologies
Computer Vision in geoscience Seismic interpretation Petrophysics Geology Remote sensing
Methods of computer vision
How computers see images? Computers see image as Pixels (unit of image that has 3 color channels: Red , Green , Blue ) Color is represented as the value intensity of each color channel (0<Intensity<255) Therefore, image is a 3-D array → (Pixel width, pixel height, channels) In remote sensing, image can be composed of more than 3 color channels Illustration of pixel representation on the LCD screen Color channels of an image Pixel width Pixel height
Types of computer vision tasks Task Definition Example Image classification Classify objects from two or more classes Classify malignant versus benign tumor from images Image regression Predict a value from image Predict the price of car from images Object detection Locate the object on an image (bounding box) Locate ripe fruits on the tree from images Object segmentation Segment the boundary of an object Segment crack on roads from images Keypoint (or pose) detection Identify the components of an object Identify parts of an animal body from images
Convolutional Neural Network (CNN) CNN is a type of neural network that can process images (N-dimensional arrays) utilizing hierarchical layers of interconnected neurons and convolutional operations to automatically learn and extract features from images and perform identification tasks on image. ‹#› Fully Connected Layer Dense neural network - Make prediction based on features Conv layer Learn features from image Pooling layer Reduces feature maps and spatial dimension Flatten layer Convert N-dimensional output to 1-dimensional 1 2 3 4 1 2 3 4
Transfer Learning Transfer learning is concept in neural network that allows to “re-use” available models, train on our use case, and fine tune Transfer learning models already have pretrained weights ‹#› Residual Net (He et al, 2015) Inception Net (Szegedy et al, 2016) VGG 16 (Liu et al, 2016) Mobile Net (Howard et al, 2017)
State-of-the-Art (SOTA) models Computer vision SOTA models are combinations of convolutional networks Generally, SOTA models have 3 components : Backbone , Neck , Head Popular SOTA models: Faster RCNN (2015), Mask RCNN (2017), and YOLO (2015-2024) Backbone Head Neck Processes image input Learn through DCNN Produces feature map Generate region proposal Localize object Bounding boxes Segment
Segmentation Models Detectron2 (Facebook/Meta, 2019) Segment Anything Model (Facebook/Meta, 2022) ‹#› U-Net (Ronneberger et al, 2015) Mask R-CNN (He et al, 2017)
Detection Models Mask R-CNN (He et al, 2017) YOLO Models (2015-) ‹#› Template Matching (Graf and Zisserman, 1988) Faster R-CNN (Ren et al, 2015)
Keypoint Detection Objects consists of a predefined set of keypoints and connections between them Very popular in human movement analysis (pose detection) Popular model such as the 8th version YOLO (YOLOv8 Pose) “Objects as keypoints” Human movement analysis (Source: OpenVino)
Use case 1: Automatic rock typing from core
Core image interpretation Drilling activity Core sample Lithology description done by petrophysicists Drilling core presents geological evidence that is used to find the oil in the rocks underneath The drilling core is then brought to the lab to be analysed Lithology description is done by petrophysicists in the lab It’s a very lengthy process!
Automatic rock typing from core image? Instead of human conducting lithology description, how about teaching Neural Networks to describe the lithology (later can be supervised by humans) ?
Labelling and annotation 500 images are carefully segmented by different classes of lithologies, namely: Bioturbated mudstone/sandstone , Massive mudstone/sandstone , Parallel-laminated mudstone/sandstone , Cross-bedded/graded-bedded sandstone , Current-rippled sandstone , Conglomerate , Fissile shale , and Heterolithic
Distribution of lithology classes Samples are too few Imbalance between number of instances of classes can severely affect the performance of computer vision model Imbalance makes high accuracy biased to class which has more instances than the others
Data augmentation A strategy to solve imbalanced class is called data augmentation Data augmentation consists of different manipulations of image by rotation, flipping, and color space shifting Augmentation can also be used to improve the model generalization by training model on different image conditions Note: Only some augmentations are useful for particular use case. In core facies segmentation where color is important, the red box cannot be used WHY ?????
Model training (x1,y1) (x2,y2) (x3,y3) (x4,y4) (x5,y5) (x6,y6) (x7,y7) (x8,y8) (x9,y9) Annotating class 3 (Par. lam. sandstone) 3 x1 y1 x2 y2 x3 y3 … To form training data, polygons (segment) need to be converted to numerical representation Following workflow for conversion → YOLO format Train, validation, and test split 75%-15%-10%
Model evaluation How good and accurate our model is? Important metrics for instance segmentation: Classification metrics: Precision, Recall, F1-score Loss: Dice loss, IoU loss Accuracy: Mean Average Precision (mAP50) Confusion matrix to show False Positives and False Negatives of model result
Segmentation result After the best model is achieved, model is used to predict lithologies from core image Inference time is very fast -> milliseconds per core image Can be used as “Quick QC” for petrophysicists to review the automatic rock typing result Original core image Segmented core
Use case 2: Boulder detection for seabed mapping
Seabed mapping for offshore oil infrastructure Offshore oil rig or infrastructure need careful planning for its structural stability Side scan sonar survey is used to map the structure of the seabed and identify obstacles such as boulders Side scan sonar capturing the Port (P) and Starboard (S) side of the seabed mapping
Keypoint Model for Seabed boulders Boulders are big rocks sedimented on the seabed, with dimension Length, Width, and Height The Length, Width are calculated as Length (oL) and Width (oW) of object How tall is the boulder???? → calculated from Shadow (oS) Keypoint representation → Object as head, while shadow as tail of keypoint Boulder center (Head) Shadow center (Tail) (Savini, 2010)
Boulder Keypoint annotations Port (P) position Starboard (S) position
Model Training Model: YOLOv8-Pose Training took 50 minutes with NVIDIA GPU T4 (200 epochs) Accuracy improvement in 2 ways: Augmentation Cropping and zooming Horizontal flip Hyperparameter evolution 50 iterations of hyperparameter search Searching hyperparameter with the best accuracy Minimize the loss curve Multiple experiments done during search of optimum hyperparameters YOLOv8-Pose architecture (Wang et al, 2024)
Boulder detection result P position S position
Generative Computer Vision
Generative adversarial networks (GAN) Generating (synthetic) image which has never exist, based on image input Pioneered by Ian Goodfellow (2014) Image to Image
GAN architecture GAN consists of Generator and Discriminator Generator: Generate fake image that resembles input image Discriminator: Judge if the generated image is fake/real Continues training until discriminator cannot distinguish between fake and real Image to Image Generator Discriminator
Applications of GAN Image to Image Reconstruction of 3D model of tight sandstone (Zhao et al, 2021) Outcrop to seismic generation
Diffusion models Generating (synthetic) image from text input by human Examples: DALL-E by OpenAI, Imagen by Google Text to Image
Diffusion model architecture Text to Image
Vision transformer (ViT) Generating texts or tasks based on image input by human Sample tasks: Generating captions from image Locate object in the photo Question and answering based on photo Image to Text
Applications of ViT Image to Text Identifying mineral from thin section Prompt: What minerals are in this sandstone? Locating fault from seismic image Prompt: Where are the faults in this seismic image?
Challenges in computer vision My paper in Springer’s Lecture Notes on Computer Science (2024)
Image quality issues Image can suffer from quality issues, for example Resolution reduction: blurred image due to camera movement or haze Occlusion: shadowed image due to object obstacles blocking the light Over-exposure: appearance looks too bright due to excessive light exposure Colour constancy: false colour of image tendency towards a certain colour Shadow Overexposure Yellow constancy (Nuwara and Trinh, 2024)
Model generalization issues Most model is trained on image with ideal quality When tested on image with quality issues, model performance reduces Model cannot generalize on image with different quality issues Image with ideal quality Image with shadow Performance degradation (Nuwara and Trinh, 2024)
Histogram Matching Histogram matching is an algorithm to transform the image based on its histogram Steps: Select a normal image as Reference image Extract the histogram of Reference image Select the image that needs to be transformed (as Source image) Extract the histogram of Source image Match the histogram of Source → Reference Result: Improved quality and balance of lighting (Nuwara and Trinh, 2024)
Model Stacking Model stacking is used to boost the performance of object detection model on low quality images by combining 2 or more models to balance the performance of each model (Nuwara and Trinh, 2024)
Improved result on low-quality images Shadow on object Light exposure (Nuwara and Trinh, 2024) BEFORE AFTER
Conclusion Computer vision makes huge impact in broad areas of geoscience Two use cases are presented using segmentation and object detection workflows Generative AI shape the future of AI implementation in geoscience