Shuheng You
06/05/2024
Hallucination of LLMs
Paper Discussion
Background
Hallucination
“Generated content that appears factual but is ungrounded”
•We want to look at the possible underlying mechanism leading to the problem
2
Background
Heuristic Solutions
Chain-of-Verification:
use LLMs to generate
verification questions
3
Chain-of-Verification Reduces Hallucination in Large Language Models. https://arxiv.org/abs/2309.11495
Background
LMvLM:
use another LLM to interact to find
inconsistencies
4
Heuristic Solutions
LM vs LM: Detecting Factual Errors via Cross Examination.
https://aclanthology.org/2023.emnlp-main.778/
Do LLMs Know What They Know?
‣P(True): the probability a model assigns to if a specific sample is the correct
answer to a question
Ask an LLM whether its own answer to a question is correct (few-shot)
5
Introduction of P(True)
Language Models (Mostly) Know What They Know. https://arxiv.org/abs/2207.05221
Do LLMs Know What They Know?
‣Models can self-evaluate their own samples with reasonable accuracy
6
Experiment on P(True)
Do LLMs Know What They Know?
‣P(IK): the probability a model assigns to if "I know"
i.e. whether it will be able to answer a given question correctly
‣Input: question itself
‣Output: the probability
through an additional binary classification head on top of the model
7
Introduction of P(IK)
Do LLMs Know What They Know?
P(IK) regarding the president of Absurdistan << P(IK) regarding the US
8
Visualization of P(IK)
Do LLMs Know What They Know?
We care about both in-distribution and out-of-bound performance of P(IK)
•In-distribution performance measures how much reliable is P(IK) trained within
a given task
•Out-of-bound performance measures the generalization ability of a trained
P(IK) on a new task
9
Experiment on P(IK)
Do LLMs Know What They Know?
Ground truth P(IK): the actual correct samples/total generated samples
10
Experiment on P(IK)
Residual Streams Across Layers
Analysis of all L hidden states and the tokens that can be predicted from them
Given different prompts (some succeed some fail to predict the correct answer)
11
Residual Streams
On Large Language Models' Hallucination with Regard to Known Facts. https://arxiv.org/abs/2403.20009
Decoder Layer
Hidden State
L *
Residual Streams Across Layers
Success token:
the activation of the
correct token when
given the optimal prompt
Failed token:
the activation of the
correct token when
given failed prompts
Hallucinated token:
the activation of the
incorrect token
12
Dynamics of Residual Streams
Residual Streams Across Layers
The dynamic of the correct token in a model
Accuracy of a trained SVM classifier on the plot:
13
Use the Pattern as a Classifier
Issues and Discussion
Issues:
•Methods are more effective to short questions (especially single token), and
often fail when given longer ones
•Only available for open source LLMs
Discussion:
•Do you think these methods are practical in production scenarios?
•If not, what do you think are the drawbacks and potential problems?
14
From the Two Papers