Siamese Neural Networks for One-shot Image Recognition Aman Elahi M11363025 Umer M11363040 SEMINAR-I
Content Background Key Concepts Proposed Architecture Pseudocode of the Architecture Architecture Variations Strengths and Weaknesses of the Architecture Comparison with Other Architectures Conclusion
Background, aiming to achieve human cognition
KEY Concepts Siamese Architecture Siamese = CNN structure + Weight Sharing + One shot inference strategy . Beyond CNNs, Siamese Networks employ several unique mechanisms: weight sharing between twin networks ensures identical feature extraction, pairwise input training teaches the model to distinguish between similarity and dissimilarity, and a metric-based learning objective (using weighted L1 distance + sigmoid) helps evaluate unseen classes without additional retraining. Traditional CNNs are trained to classify inputs by learning direct mappings from input images to specific class labels. However, this approach requires large labeled datasets and retraining when new classes are introduced. In contrast, Siamese Networks learn a generalizable similarity function that compares image pairs.
A simple 2 hidden layer siamese network for binary classification with logistic prediction p. The structure of the net work is replicated across the top and bottom sections to form twin networks, with shared weight matrices at each layer. Proposed Architecture : Siamese Neural Network
Proposed Architecture : Siamese Neural Network Best convolutional architecture selected for verification task. Siamese twin is not depicted, but joins immediately after the 4096 unit fully-connected layer where the L1 component-wise distance between vectors is computed.
Pseudocode of the Architecture Algorithm SiameseNetwork (x1, x2): Input: x1 (first input image), x2 (second input image) Output: y (similarity score) Step 1: Preprocess input images x1 ← Preprocess(x1) x2 ← Preprocess(x2) Step 2: Apply convolutional layers for feature extraction x1_feature ← ConvolutionalNetwork (x1) x2_feature ← ConvolutionalNetwork (x2) Step 3: Apply L1 distance metric distance ← L1Distance(x1_feature, x2_feature) Step 4: Apply sigmoid activation function to calculate similarity y ← Sigmoid(distance) Return y This pseudocode mirrors the key processes in the Siamese Network for one-shot image recognition, including the preprocessing of input images, the convolutional feature extraction, the calculation of the distance between feature vectors, and the final similarity score using the sigmoid activation.
Experiment Results The Siamese network achieved 91.63% accuracy on the Omniglot dataset for the verification task (testing whether two images are from the same class), outperforming previous models. The Siamese network achieved 91.63% accuracy for the verification task , outperforming previous models. Applying affine distortions (e.g., rotations, scaling) during training increased accuracy, achieving 93.42%. For one-shot classification (classifying images based on a single example), the Siamese network achieved a 92% accuracy. The network showed strong generalization capabilities, performing well on unseen classes during the one-shot recognition task without needing retraining, unlike traditional CNNs.
Strengths of the Architecture
Weaknesses of the Architecture
Comparison with Other Architectures Feature Siamese Neural Network ResNet (CNN) Vision Transformer (ViT) Architecture Type Twin CNNs with shared weights Pure CNN Transformer (Self-Attention) Training Objective Learn similarity between pairs (verification) Classification Classification Data Efficiency Very High (works with few examples) Medium Low (needs lots of data) Generalization to New Classes Strong (can classify unseen classes without retraining) Weak (needs retraining) Weak (needs retraining) Training Complexity Moderate (requires pair construction) Easy Difficult (large datasets, high compute) Distance Metric Usage Yes (e.g., L1 distance + sigmoid) No No Pretrained Model Availability Limited Widely available Widely available Use Cases One-shot learning, signature/face verification Classification (image recognition) Large-scale classification, multi-modal tasks Limitations Sensitive to pair construction and needs high-quality labeled data Requires large datasets for fine-tuning Needs large datasets and heavy computational resources
Conclusion
Thank You! "If a machine can learn to do anything that a human can do, then it has passed the test of true artificial intelligence.“ ~Alan Turing