Agin Anuradha's Image Caption Generator: Revolutionizing Visual Content Interpretation

jadavvineet73 41 views 16 slides Sep 13, 2024
Slide 1
Slide 1 of 16
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16

About This Presentation

Explore the groundbreaking project by Agin Anuradha that harnesses the power of artificial intelligence to generate descriptive captions for images. This presentation delves into the technology and methodologies behind the image caption generator, demonstrating its potential to enhance accessibility...


Slide Content

Image Caption Generator

Introduction Objective: To build a model that generates captions for images using a combination of deep learning and attention mechanisms. Key Points: Image feature extraction with VGG16 (pre-trained model). Caption generation using encoder-decoder architecture with attention. Evaluation using BLEU scores.

Dataset Dataset: Flickr8k Dataset Images: 8000 images of various scenes. Captions: 5 captions per image. Source: Link to the dataset. Preprocessing: Images resized to 224x224. Captions cleaned and tokenized.

VGG16 Feature Extraction: Using VGG16 model to extract image features. Last classification layer removed. Output: 4096-dimension feature vector for each image.

Caption Cleaning: Convert to lowercase. Remove non-alphabetic characters. Add startseq and endseq tokens. Tokenization: Convert text to sequences using Tokenizer. Vocabulary size calculated. Maximum caption length determined. Text PreProcesing

Functionality: Generates training data batches. Outputs pairs of image features and tokenized captions. Reduces memory usage by avoiding loading all data at once. Data Generator

Model Architecture Encoder-Decoder with Attention: Encoder: Processes image features using Dense and LSTM layers. Attention Mechanism: Aligns image features with corresponding words in the caption. Decoder: Generates the next word in the caption using LSTM.

Attention Mechanism Functionality: Focuses on relevant parts of the image when generating each word. Attention scores are calculated using the Dot layer. Importance: Helps the model align visual features with the corresponding text more effectively.

Model Training Training Setup: Epochs: 50 Batch Size: 32 Loss Function: Categorical Crossentropy Optimizer: Adam Validation: Split the dataset (90% training, 10% validation).

Results: Caption Generation

Evaluation with BLUE Scores Evaluation Metric: BLEU-1: Measures unigram precision. BLEU-2: Measures bigram precision. Scores:

Challenges Challenges Encountered : Handling long captions with complex dependencies. Attention model tuning. BLEU score sensitivity to short captions.

Deployment on Streamlit

Conclusion Key Conclusion: Successfully built an image captioning model using VGG16 and LSTM with attention. Achieved meaningful results as evaluated by BLEU scores. Future Work: Fine-tuning: Experiment with fine-tuning the captioning model architecture and hyperparameters for improved performance. Dataset Expansion: Incorporate additional datasets to increase the diversity and complexity of the trained model for example we can train the model on Flickr30k dataset.

Questions ?

Thank You!