社内勉強会資料  Mamba - A new era or ephemeral

NABLAS 447 views 13 slides May 17, 2024
Slide 1
Slide 1 of 13
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13

About This Presentation

選択的状態空間モデル(Selective SSM)を用いることで、必要最小限の情報に着目し計算効率の向上を達成するMambaについて紹介しています


Slide Content

Mamba: A new era or
ephemeral?
Mamba: Linear-Time Sequence Modeling with Selective State
Spaces

Transformer cannot
Drawbacks of transformers are finite window, quadratic scaling with respect to the
window length
O(L^2)





Generating tokens for a sequence of length L needs roughly L² computations which can be costly if the
sequence length increases.

RNN: recurrent neural network
Fast inference, linear complexity



Drawbacks
-small memory
-not parallelizable

SSM: state space model
Input x
output y
hidden state h
Looks
familiar ??????

Mamba how to solve small memory problem
Matrix A responsible for updating hidden state H, how to update memory along
sequence continues?
hippo as High-order Polynomial Projection:

Mamba: A [NxD] matrix definition
Input shape: [2, 64, 16]
A: [32, 16]

SSM: parallel training
What hinders RNN is that its nonlinearity tanh() cannot be fastly trained.
From definition, SSM kind of like a RNN without tanh()

SSM: parallel training

In a way, it’s like a convolution operation

Mamba: selective SSM
Not Content-awareness: independent A, B, C results in problems with
content-awareness.



In comparison, these tasks are relatively easy for Transformers since they
dynamically change their attention based on the input sequence. They can
selectively “look” or “attend” at different parts of the sequence.

Mamba: selective SSM
Mamba makes matrics B, C, and step size Δ depend on the input, which is similar
to transformer




This raised another issue -> cannot train parallelly like in S4

parallel scan?
As long as algorithms satisfy the associative prosperity, can be parallelized!





t is the parallel thread

feels like
DP

hardware-aware algorithm
Skip

Results and thoughts
-useful
-Can it replace transformers?