présentation about computer vision proje

qzbjgw8cwx 11 views 18 slides Jun 12, 2024
Slide 1
Slide 1 of 18
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18

About This Presentation

presentation about project named on robustness of unsurpervised learning models to spurious correlations


Slide Content

On robustness of unsupervised learning models to spurious correlations Dana & Junior MVA master ENS Paris-Saclay ‹#›

Introduction Definition : What’s spurious feature ? a feature or variable in a dataset that appears to be associated with target variable but does not actually have a meaningful or causal relationship. E.g. : waterbirds appear with water backgrounds, and landbirds with land backgrounds Definition: What is group? We assume that each data point has an attribute s ∈ S which is spuriously correlated with the label y, and the groups are defined by a combination of the label and spurious attribute: G ∈ Y × S. ‹#›

P roblem statement ‹#› Hypothesis: The model is more likely to make mistakes on certain groups if it learns the spurious feature. The objective: To balance and improve the performance across all groups. Hence, we can formulate the tasks as follows accurately identify the groups, which are not always known in a dataset effectively using the group information to finally improve the model’s robustness. Our goal is to examine the vulnerability of the representations of the pre-trained models to spurious correlation. Figure 1. An overview of the problem [1]

Datasets CelebA : photos of celebrities from the CelebA dataset` target y : hair color (blond vs non-blond) spurious feature a : The gender (female, male) 4 groups defined to the tuples (y, a) Waterbirds : images of birds target y : type of the bird (waterbird or landbird) spurious feature a : type of background (water or land) 4 groups defined to the tuples (y, a) ‹#› Figure 1: Representative training and test examples for the datasets we consider. The correlation between the label y and the spurious attribute a at training time does not hold at test time. [3]

‹#›

Literature Review: 2 methods To effectively use the group information to improve robustness: GroupDRO: Dynamically increase the weight of the worst group loss in minimization Importance weighting to reweight the groups Class balancing Augment minority groups, generating synthetic groups using GAN Train using ERM first and then finetune the last layer on balanced data from training or validation, or mixed representations The group information is not available during the training: Accurately finding the groups by training two models. They still use DRO method once the group is found. ‹#›

Methods: ERM, GroupDRO, DFR ERM(empirical risk minimization) Minimizes average loss across the training dataset. Effective for high average test accuracy in similar (in-distribution) data scenarios. Limitations: Potential brittleness under distribution shifts. Poor performance on underrepresented or imbalanced subgroups. Vulnerable to spurious correlations. ‹#› Group DRO(distributionally robust optimization) Minimizes worst-case loss across different subgroups in the training data. Focuses on performance equity across minority or underrepresented groups. Strengthens model robustness against imbalanced datasets. Regularization techniques enhance worst-group accuracy. Slight compromise on average accuracy for significant gains in fairness and robustness. Ideal for applications where fairness and equity are critical. DFR (Deep feature reweighting) M ethod used evaluate the quality of the learned feature representation. Retrain the classification layer of a deep neural network with a small, controlled, balanced dataset where the spurious correlation does not hold to reweight the importance of spurious and complex features.

Method: Progressive Data Expansion ‹#› Update Warm-up Loss Expansion Loss

Why PDE? Provides theoretical framework for progressive data expansion; PDE does not require further finetuning of the model. Efficient and faster training; Provides consistent superior Worst Group Performance. ‹#› Table 2. The worst-group and average accuracy (%) of PDE compared with state-of-the-art methods. [1] Table 1. Training efficiency of PDE and GroupDRO on Waterbirds. [1] SL SSL AE ERM ✓ ✓ ✓ GroupDRO ✓ ✓ ✓ DFR ✓ ✓ ✓ PDE ✓ X X

Main contributions Adapted the progressive data expansion method for unsupervised learning models. (adapt the code from [1]) Analyzed feature maps in unsupervised models to understand their internal representations and learning mechanisms by using Grad-Cam Investigated the effects of various learning rates (1e-2, 1e-3, 1e-4) on MAE, DINO, and DINOv2 models, providing insights into optimal training dynamics. Compared Self-Supervised, Supervised, and Autoencoder-based model’s robustness against spurious features. We have implemented the Grad-Cam method code for feature maps visualization. ‹#›

Results Table 3. The results of PDE training for ERM, GroupDRO methods for Dino, Dinov2 and Resnet ‹#›

Feature maps ‹#› DINOV2 with vits backbone trained on waterbirds Dino model with resnet50 backbone trained on waterbirds

Analysis ‹#› The initial learning phase has a considerable influence on subsequent training for widely-used unsupervised features. It learns v_c in the warmup stage. In contrast to the main paper, we found that for the unsupervised learning based features the smaller learning rate is necessary. Our feature map shows that the pre-trained Dino model, with resnet50 backbone, relies on core features for classification for both waterbirds and celebA dataset. In the case of Dinov2 on the waterbirds dataset, it relies mainly on spurious features for classification. This may explain why the worst Group accuracy score is so low ( 0.486) for this model.

Visualizations Figure 1. The effect of resetting momentum for the PDE - Dino on Waterbirds ‹#›

Graphs with different learning rates General observation : Mae and Dinov2 d on’t converge with large lr= 1e-2 MAE and Dinov2 have quite good results with lr=1e-3 and 1e-4 ‹#› AVG accuracy of Dinov2 with lr=1e-2 AVG accuracy of MAE with lr=1e-2

Further work before submitting the report Finish training of MAE Compare with ERM, GroupDRO without PDE Explore more about the worse performance of dinov2 and mae For dinov2, mae try lr 1e-5, 1e-6. ‹#›

Thank you! ‹#›

References [1]- Yihe Deng , Yu Yang , Baharan Mirzasoleiman , Quanquan Gu (2023) . Robust Learning with Progressive Data Expansion Against Spurious Correlation [2] - Izmailov, P., Kirichenko, P., Gruver, N., & Wilson, A. G. (2022). On feature learning in the presence of spurious correlations. Advances in Neural Information Processing Systems , 35 , 38516-38532. [3] - Sagawa, S., Koh, P. W., Hashimoto, T. B., & Liang, P. (2019). Distributionally robust neural networks for group shifts: On the importance of regularization for worst-case generalization. arXiv preprint arXiv:1911.08731 . [4] - Shi, Y., Daunhawer, I., Vogt, J. E., Torr, P., & Sanyal, A. (2022, July). How robust are pre-trained models to distribution shift?. In ICML 2022: Workshop on Spurious Correlations, Invariance and Stability . [5] - Li, X., Dai, Y., Ge, Y., Liu, J., Shan, Y., & Duan, L. Y. (2022). Uncertainty modeling for out-of-distribution generalization. arXiv preprint arXiv:2202.03958 . ‹#›