Language Modeling Putting a curve to the bag of words

JCGonzaga1 25 views 28 slides Mar 03, 2025

Slide 1 of 28

About This Presentation

Putting a curve to the bag of words

Size: 99.37 KB

Language: en

Added: Mar 03, 2025

Slides: 28 pages

Slide Content

Language Modeling
Putting a curve to the bag of
words
Courtesy of Chris Jordan

What models we covered in
class so far
•Boolean
•Extended Boolean
•Vector Space
–TF*IDF
•Probabilistic Modeling
–log P(D|R) / P(D|N)

Probability Ranking Principle
“If a reference retrieval system's response to each
request is a ranking of the documents in the
collection in order of decreasing probability of
relevance to the user who submitted the request,
where the probabilities are estimated as
accurately as possible on the basis of whatever
data have been made available to the system for
this purpose, the overall effectiveness of the
system to its user will be the best that is
obtainable on the basis of those data.”
- Robertson

Bag of words? What bag?
•Documents are a vector of term
occurrences
•Assumption of exchangeability
•What is this really?
–A hyperspace where each dimension is
represented by a term
–Values are term occurrences

Can we model this bag?
•Binomial Distribution
–Bernoulli / Success Fail Trials
–e.g. Flipping a coin: chance of getting a head
•Multinomial
–Probability of events occurring
–e.g. Flipping a coin: chance of head, chance of tail
–e.g. Die Roll: chance of 1, 2, …, 6
–e.g. Document: chance of a term occurring

Review
•What is the Probability Ranking
Principle?
•What is the bag of words model?
•What is exchangeability?
•What is a binomial?
•What is a multinomial?

Some Terminology
•Term: t
•Vocabulary: V = {t
1 t
2 … t
n}
•Document: d
x = t
dx1 … t
dxm  V
•Corpus: C = {d
1 d
2… d
k}
•Query: Q = q
1 q
2 … q
i  V

Language Modeling
•A document is represented by
multinomial
•Unigram model
–A piece of text is generated by each term
independently
•p(t
1 t
2 … t
n) = p(t
1)p(t
2)…p(t
n)
•p(t
1)+p(t
2)+…+p(t
n)=1

Why Unigram
•Easy to implement
–Reasonable performance
•Word order and structure not captured
–How much benefit would they add?
•Open question
•More parameters to tune in complex models
–Need more data to train
–Need more time to compute
–Need more space to store

Review
•What is the unigram model?
•Is the language model a binomial or
multinomial?
•Why use the unigram model?
•Given a query, how do we use a
language model to retrieve documents?

What is wrong with MLE
•Creates 0 probabilities for terms that do
not occur
•0 probabilities break similarity scoring
function
•Is a 0 probability sensible?
–Can a word never ever occur?

How can we fix this?
•How do we get around the zero
probabilities?
–New similarity function?
–Remove zero probabilities?
•Build a different model?

Smoothing Approaches
•Laplace / Addictive
•Mixture Models
–Interpolation
•Jelinek Mercer
•Dirichlet
•Absolute Discounting
–Backoff

Laplace
•Just up all term frequencies by 1
•Where have you seen this before?
•Is this a good idea?
–Strengths
–Weaknesses

Interpolation
•Mixture model approach
–Combine probability models
•Traditionally combine document model
with the corpus model
•Is this a good idea?
–What else is the corpus model used for?
–Strengths
–Weaknesses

Backoff
•Only add probability mass to terms that
are not seen
•What does this do to the probability
model?
–Flatter?
•Is this a good idea?

Are their other sources for
probability mass?
•Document Clusters
•Document Classes
•User Profiles
•Topic models

Review
•What is wrong with 0 probabilities?
•How does smoothing fix it?
•What is smoothing really doing?
•What is Interpolation?
–What is that mixture model really
representing?
•What can we use to mix with the
document model?

Bored yet? Let’s do something
complicated
•Entropy - Information Theory
–H(x) = -∑p(x) log p(x)
–Good for data compression
•Relative Entropy
–D(p||q) = ∑p(x) log (p(x)/q(x))
–Not a true distance measure
–Used to find differences between
probability models

Ok… that’s nice
•What does relative entropy give us?
–Why not just subtract probabilities?
–On your calculators calculate
p(x) log (p(x)/q(x)) for
•p(x) = .8, q(x) = .6
•p(x) = .6, q(x) = .4

Clarity Score
•Calculate the relative entropy between
the result set and the corpus
–Positive correlation between high clarity
score / relative entropy and query
performance
–So what is that actually saying?

Relative Entropy Query
Expansion
•Relevance Feedback
•Blind Relevance Feedback
•Expand query with terms that contribute
the most to relative entropy
•What are we doing to the query when
we do this?

Controlled Query Generation
•Some of my research
•p(x) log (p(x)/q(x)) is a good term
discrimination function
•Regulate the construction of queries for
evaluating retrieval algorithms
–First real controlled reaction experiments
with retrieval algorithms

Review
•Who is the father of Information
Theory?
•What is Entropy?
•What is Relative Entropy?
•What is the Clarity Score?
•What are the terms that contribute the
most to relative entropy?
–Are they useful?

You have been a good class
•Introduced to the language model for
information retrieval
•Documents represented as multinomial
distributions
–Generative model
–Queries are generated
•Smoothing
•Applications in IR

Questions for me?

Questions for you
•What is the Maximum Likelihood
Estimate?
•Why is smoothing important?
•What is interpolation?
•What is entropy?
•What is relative entropy?
•Does language modeling make sense?

Language Modeling Putting a curve to the bag of words

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Language Modeling Putting a curve to the bag of words

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Pray For The Peace Of Jerusalem and You Will Prosper

Don_t_Waste_Your_Life_God.....powerpoint

VILLASUR_FACTORS_TO_CONSIDER_IN_PLATING_SALAD_10-13.pdf

Fertility awareness methods for women in the society

Chapter 5 Arithmetic Functions Computer Organisation and Architecture

syakira bhasa inggris (1) (1).pptx.......