Computer Vision and GenAI for Geoscientists.pptx

YohanesNuwaraNuwara 780 views 47 slides Jul 28, 2024
Slide 1
Slide 1 of 47
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47

About This Presentation

Presentation in a webinar hosted by Petroleum Engineers Association (PEA) in 28 July 2023. The topic of the webinar is computer vision for petroleum geoscience.


Slide Content

Computer Vision and GenAI in Geoscience YOHANES NUWARA PETROLEUM ENGINEERS ASSOCIATION (PEA) Trondheim, 28.07.2024

Yohanes Nuwara Career: Data scientist at Prores AS, Norway (2024-) Computer vision for porosity and permeability prediction from core images Lead data analyst at APP Sinarmas, Indonesia (2022-2023) Sustainability dashboard for management Expert data scientist at APP Sinarmas, Indonesia (2022-2023) LiDAR, computer vision for remote sensing UAV Research engineer at OYO Corporation, Japan (2020-2022) Distributed Acoustic Sensing (DAS) for earthquake seismology Education: Politecnico di Milano, Italy (2023-) Master’s in Business Analytics and Big Data Bandung Institute of Technology, Indonesia (2015-2019) Bachelor’s in Geophysical Engineering

Outline What is computer vision? Computer vision methods and models Use case 1: Automatic rock typing using segmentation model Use case 2: Boulder detection for seabed mapping Challenges in computer vision What is Generative AI? Generative vision models Conclusion

Computer vision

What is computer vision? Computer vision is a scientific field which task is to understand image based on its pixel information using traditional image analysis and artificial intelligence 1 2 3 4

Why computer vision is so growing??? ‹#› Rapid growth of computers and hardware chips Bigger, modern, and more secure data storage Rapid evolution of AI computer vision models More and more advanced optics and camera technologies

Computer Vision in geoscience Seismic interpretation Petrophysics Geology Remote sensing

Methods of computer vision

How computers see images? Computers see image as Pixels (unit of image that has 3 color channels: Red , Green , Blue ) Color is represented as the value intensity of each color channel (0<Intensity<255) Therefore, image is a 3-D array → (Pixel width, pixel height, channels) In remote sensing, image can be composed of more than 3 color channels Illustration of pixel representation on the LCD screen Color channels of an image Pixel width Pixel height

Types of computer vision tasks Task Definition Example Image classification Classify objects from two or more classes Classify malignant versus benign tumor from images Image regression Predict a value from image Predict the price of car from images Object detection Locate the object on an image (bounding box) Locate ripe fruits on the tree from images Object segmentation Segment the boundary of an object Segment crack on roads from images Keypoint (or pose) detection Identify the components of an object Identify parts of an animal body from images

Convolutional Neural Network (CNN) CNN is a type of neural network that can process images (N-dimensional arrays) utilizing hierarchical layers of interconnected neurons and convolutional operations to automatically learn and extract features from images and perform identification tasks on image. ‹#› Fully Connected Layer Dense neural network - Make prediction based on features Conv layer Learn features from image Pooling layer Reduces feature maps and spatial dimension Flatten layer Convert N-dimensional output to 1-dimensional 1 2 3 4 1 2 3 4

Transfer Learning Transfer learning is concept in neural network that allows to “re-use” available models, train on our use case, and fine tune Transfer learning models already have pretrained weights ‹#› Residual Net (He et al, 2015) Inception Net (Szegedy et al, 2016) VGG 16 (Liu et al, 2016) Mobile Net (Howard et al, 2017)

State-of-the-Art (SOTA) models Computer vision SOTA models are combinations of convolutional networks Generally, SOTA models have 3 components : Backbone , Neck , Head Popular SOTA models: Faster RCNN (2015), Mask RCNN (2017), and YOLO (2015-2024) Backbone Head Neck Processes image input Learn through DCNN Produces feature map Generate region proposal Localize object Bounding boxes Segment

Segmentation Models Detectron2 (Facebook/Meta, 2019) Segment Anything Model (Facebook/Meta, 2022) ‹#› U-Net (Ronneberger et al, 2015) Mask R-CNN (He et al, 2017)

Detection Models Mask R-CNN (He et al, 2017) YOLO Models (2015-) ‹#› Template Matching (Graf and Zisserman, 1988) Faster R-CNN (Ren et al, 2015)

Keypoint Detection Objects consists of a predefined set of keypoints and connections between them Very popular in human movement analysis (pose detection) Popular model such as the 8th version YOLO (YOLOv8 Pose) “Objects as keypoints” Human movement analysis (Source: OpenVino)

Use case 1: Automatic rock typing from core

Core image interpretation Drilling activity Core sample Lithology description done by petrophysicists Drilling core presents geological evidence that is used to find the oil in the rocks underneath The drilling core is then brought to the lab to be analysed Lithology description is done by petrophysicists in the lab It’s a very lengthy process!

Automatic rock typing from core image? Instead of human conducting lithology description, how about teaching Neural Networks to describe the lithology (later can be supervised by humans) ?

Labelling and annotation 500 images are carefully segmented by different classes of lithologies, namely: Bioturbated mudstone/sandstone , Massive mudstone/sandstone , Parallel-laminated mudstone/sandstone , Cross-bedded/graded-bedded sandstone , Current-rippled sandstone , Conglomerate , Fissile shale , and Heterolithic

Distribution of lithology classes Samples are too few Imbalance between number of instances of classes can severely affect the performance of computer vision model Imbalance makes high accuracy biased to class which has more instances than the others

Data augmentation A strategy to solve imbalanced class is called data augmentation Data augmentation consists of different manipulations of image by rotation, flipping, and color space shifting Augmentation can also be used to improve the model generalization by training model on different image conditions Note: Only some augmentations are useful for particular use case. In core facies segmentation where color is important, the red box cannot be used WHY ?????

Model training (x1,y1) (x2,y2) (x3,y3) (x4,y4) (x5,y5) (x6,y6) (x7,y7) (x8,y8) (x9,y9) Annotating class 3 (Par. lam. sandstone) 3 x1 y1 x2 y2 x3 y3 … To form training data, polygons (segment) need to be converted to numerical representation Following workflow for conversion → YOLO format Train, validation, and test split 75%-15%-10%

Model evaluation How good and accurate our model is? Important metrics for instance segmentation: Classification metrics: Precision, Recall, F1-score Loss: Dice loss, IoU loss Accuracy: Mean Average Precision (mAP50) Confusion matrix to show False Positives and False Negatives of model result

Segmentation result After the best model is achieved, model is used to predict lithologies from core image Inference time is very fast -> milliseconds per core image Can be used as “Quick QC” for petrophysicists to review the automatic rock typing result Original core image Segmented core

Use case 2: Boulder detection for seabed mapping

Seabed mapping for offshore oil infrastructure Offshore oil rig or infrastructure need careful planning for its structural stability Side scan sonar survey is used to map the structure of the seabed and identify obstacles such as boulders Side scan sonar capturing the Port (P) and Starboard (S) side of the seabed mapping

Keypoint Model for Seabed boulders Boulders are big rocks sedimented on the seabed, with dimension Length, Width, and Height The Length, Width are calculated as Length (oL) and Width (oW) of object How tall is the boulder???? → calculated from Shadow (oS) Keypoint representation → Object as head, while shadow as tail of keypoint Boulder center (Head) Shadow center (Tail) (Savini, 2010)

Boulder Keypoint annotations Port (P) position Starboard (S) position

Model Training Model: YOLOv8-Pose Training took 50 minutes with NVIDIA GPU T4 (200 epochs) Accuracy improvement in 2 ways: Augmentation Cropping and zooming Horizontal flip Hyperparameter evolution 50 iterations of hyperparameter search Searching hyperparameter with the best accuracy Minimize the loss curve Multiple experiments done during search of optimum hyperparameters YOLOv8-Pose architecture (Wang et al, 2024)

Boulder detection result P position S position

Generative Computer Vision

Generative adversarial networks (GAN) Generating (synthetic) image which has never exist, based on image input Pioneered by Ian Goodfellow (2014) Image to Image

GAN architecture GAN consists of Generator and Discriminator Generator: Generate fake image that resembles input image Discriminator: Judge if the generated image is fake/real Continues training until discriminator cannot distinguish between fake and real Image to Image Generator Discriminator

Applications of GAN Image to Image Reconstruction of 3D model of tight sandstone (Zhao et al, 2021) Outcrop to seismic generation

Diffusion models Generating (synthetic) image from text input by human Examples: DALL-E by OpenAI, Imagen by Google Text to Image

Diffusion model architecture Text to Image

Vision transformer (ViT) Generating texts or tasks based on image input by human Sample tasks: Generating captions from image Locate object in the photo Question and answering based on photo Image to Text

Applications of ViT Image to Text Identifying mineral from thin section Prompt: What minerals are in this sandstone? Locating fault from seismic image Prompt: Where are the faults in this seismic image?

Challenges in computer vision My paper in Springer’s Lecture Notes on Computer Science (2024)

Image quality issues Image can suffer from quality issues, for example Resolution reduction: blurred image due to camera movement or haze Occlusion: shadowed image due to object obstacles blocking the light Over-exposure: appearance looks too bright due to excessive light exposure Colour constancy: false colour of image tendency towards a certain colour Shadow Overexposure Yellow constancy (Nuwara and Trinh, 2024)

Model generalization issues Most model is trained on image with ideal quality When tested on image with quality issues, model performance reduces Model cannot generalize on image with different quality issues Image with ideal quality Image with shadow Performance degradation (Nuwara and Trinh, 2024)

Histogram Matching Histogram matching is an algorithm to transform the image based on its histogram Steps: Select a normal image as Reference image Extract the histogram of Reference image Select the image that needs to be transformed (as Source image) Extract the histogram of Source image Match the histogram of Source → Reference Result: Improved quality and balance of lighting (Nuwara and Trinh, 2024)

Model Stacking Model stacking is used to boost the performance of object detection model on low quality images by combining 2 or more models to balance the performance of each model (Nuwara and Trinh, 2024)

Improved result on low-quality images Shadow on object Light exposure (Nuwara and Trinh, 2024) BEFORE AFTER

Conclusion Computer vision makes huge impact in broad areas of geoscience Two use cases are presented using segmentation and object detection workflows Generative AI shape the future of AI implementation in geoscience

Thank you!