Autoregressive Models and Beam Search - Mathematical Formulation with Example

MurtazaTaj1 0 views 9 slides Oct 08, 2025
Slide 1
Slide 1 of 9
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9

About This Presentation

Auto regressive beam search - Mathematical Formulation with Example


Slide Content

Autoregressive Models and Beam Search Mathematical Formulation with Example

1. General Autoregressive Model Classical AR(p): x_t = φ₁x_{t-1} + φ₂x_{t-2} + ... + φ_px_{t-p} + ε_t φ_i: coefficients, ε_t: white noise

2. Autoregressive Sequence Modeling Chain rule factorization: P(x₁, …, x_T) = ∏ P(x_t | x₁, …, x_{t-1}) Each token depends on all previous tokens

3. Autoregressive Language Model Neural version (RNN/Transformer): P(x_t | x₁:ₜ₋₁; θ) = Softmax(W h_t) h_t = f_θ(x₁:ₜ₋₁) hidden representation Training objective: maximize log-likelihood

4. Decoding Strategies Greedy: x*_t = argmax P(x|x₁:ₜ₋₁) Beam Search: keep top-B sequences by probability Sampling: sample from distribution

5. Example Setup Vocabulary: {I, You, love, hate, cats, dogs} Generate 3 tokens after start Probabilities defined for transitions

6. Greedy Search Step 1: I (-0.5978) vs You (-0.7985) → choose I Step 2: love (-0.5108) vs hate (-0.9163) → choose love Step 3: cats vs dogs (tie) → choose cats Final sequence: I love cats, logP = -1.8018 (P≈0.165)

7. Beam Search (B=2) After Step 2: top partials: 'You hate' (-1.0216), 'I love' (-1.1087) Step 3 expansions: You hate dogs: logP = -1.1270 (P≈0.324) I love cats/dogs: logP = -1.8018 (P≈0.165) Beam result: You hate dogs (better global probability)

8. Key Takeaways Autoregressive models use chain rule factorization Greedy = locally best, can miss global optimum Beam search keeps multiple candidates → better results Log-probabilities used for stability (sum instead of multiply)
Tags