Introduction to Sparse Autoencoders and its applications

ssuser77a975 58 views 18 slides Jan 01, 2025
Slide 1
Slide 1 of 18
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18

About This Presentation

The presenation slide which introduce sparse autoencoders. This presentation covers the property of SAE and use case to LLM interpretivity. This slide also demonstrate the connection with Vector-quantized autoencoders and SAE.


Slide Content

Sparse Autoencoders 2024/12/18 Shunsuke Sakai

Autoencoder ・Learn to reconstruct given input with encoder-decoder architecture.       Objective ・It can learn meaningful representation of data in a fully unsupervised manner. ・Typically, the latent dimension is set to lower than the input dimensionality.

Sparse Latent Representation ・You can consider another constraint, such as the “sparsity” of latents. ・Denote as the j-th element the of latent vector when the model is given input .   ・We can imposes constraint such that fires times on whole training set.   ・Th en , the average activation of on all training samples can be defined as:   ・If takes values in the range , is the good proxy to the average firing rate.  

Sparse Latent Representation ・If you want to impose a target firing ratio you can simply calculate KL-divergence b/w two Bernulli distribution with mean , .   ・This term increase when the average firing ratio differ from :  

Sparse Latent Representation

Sparse Latent Representation ・If you want to impose a target firing ratio you can simply calculate KL-divergence b/w two Bernulli distribution with mean , .   ・This term increase when the average firing ratio differ from :   ・Overall, the loss function for sparse autoencoder can be described as:

What is the benefit?             ・The difference b/w standard autoencoders comes from the sparsity of latents. Standard Autoencoder Sparse Autoencoder

What is the benefit? ・Typically, one neuron correspond s to multiple concepts. e.g., a neuron in LLM fires with the inpute text “ HTML ” and “ cat ” simult a neously. ・Such “superposi tion ” makes it difficult to understand the model’s behavio r. ・Once we can encode the model’s representation into sparse representation, it is easy to inte r pret.

What is the benefit? ・Let’s consider the case where the decoder is simply a linear projection. ・You can see the weights as the set of dimensional learnable vectors (each column is called as “atom”)       ・This form of traini n g is often refer r ed as ”dictionaly learning”

LLM Interpretability ・In recent years, LLM has gained widespread interest in the br oader area. ・However, these LLM s are usually treated as a black box, and their behavio r is poorly understood. Transformer Block Transformer Block Transformer Block Transformer Block My heart is broken: Input Text What information is preserved?

Feature Interpretation with SAE ・For a specific activation of layer l, we can get meaningful representation by training SAE to reconstruct its original activations . Transformer Block   ・We can extract a sparse representation of these activations. ・By finding inputs that correspond to this sparse representation, we can get an explanation of the activations.

Variants of Sparse Autoencoder K-sparse Autoencoders [Makhzani & Frey, 2013] ・Extract top-K activations instead of conventional lifetime constraints( ).   ・Profs. It ensures that K activations are fired on any inputs at the most. ・Profs. Strong connection with Iterative Thresholding with Inversion(ITI) Winner-Take-All Autoencoders [Makhzani & Frey, 2014] ・Introduce spatial sparse constraints with convolutional SAE. ・Profs. It works well with convolutional autoencoders with the ReLU function.

T h e connection to my work ・I am working on image anomaly detection. ・In recent years, the reconstruction-based method achieved good detection per formance. Input Image Reconstructed Image ・Many method uses the vector quantization method (VQ-AE) for information bottleneck.

VQ-AE is a specific form of top-K SAE! ・I don’t have enough time, a detailed explanation will be provided after this presentation. ・Intuitively, VQ-AE use s a shared weight matrix for decoder weights and the weights of the last encoder layer. Also, the K is set to 1. ・VQ-AE exhibit s high detection performance, but in some classes, it suffer s from poor reconstruction ability on normal images. ・In our work, we will implement a top-K SAE- based anomaly detection method, and investigate the difference between top-K SAE and VQ-AE.

References