languagemodel.ppt.Web Intelligence compu

JohnrayMendoza 7 views 17 slides May 17, 2024

Slide 1 of 17

About This Presentation

It's about language model

Size: 285.44 KB

Language: en

Added: May 17, 2024

Slides: 17 pages

Slide Content

1
Language Model
CSC4170 Web Intelligence and Social Computing
Tutorial 8
Tutor: Tom Chao Zhou
Email: [email protected]

2
Outline
Language models
Finite automata and language models
Types of language models
Multinomial distributions over words
Query likelihood model
Application
Q&A
Reference

3
Language Models (LMs)
How can we come up with good queries?
Think of words that would likely appear in a relevant document.
Idea of LM:
A document is a good match to a query if the document model is
likely to generate the query.

4
Language Models (LMs)
Generative Model:
Recognize or generate strings.
The full set of strings that can be generated is called the language of the
automaton.
Language Model:
A function that puts a probability measure over strings drawn from some
vocabulary.

5
Language Models (LMs)
Example 1:
Calculate the probability of a word sequence.
Multiply the probabilities that the model gives to each word in the
sequence, together with the probability of continuing or stopping
after producing each word.
P(frog said that toad likes frog)=(0.01*0.03*0.04*0.01*0.02*0.01)
 *(0.8*0.8*0.8*0.8*0.8*0.8*0.2)
 =0.000000000001573
Most of the time, we will omit to include STOP and (1-STOP)
probabilities.

6
Language Models (LMs)
Example 2:
P(s|M
1)>P(s|M
2)

7
Language Models (LMs)
Basic LM using chain rule:
Unigram language model:
Throws away all conditioning context.
Most used in Information Retrieval.
Bigram language model:
Condition on the previous term.

8
Language Models (LMs)
Unigram LM:
Bag-of-words model.
Multinomial distributions over
words. 


Mi
dtd
i
tfL
1
,
The length of document d. M is
the size of the vocabulary.
multinomial coefficient, can leave out in practical
calculations.

9
Query Likelihood Model
Query likelihood model:
Rank document by P(d|q)
Likelihood that document d is
relevant to the query.
Using Bayes rule:
P(q) is the same for all
documents.
P(d) is treated as uniform
across all d. )|()|( dqPqdP 

10
Query Likelihood Model
Multinomial + Unigram:
Retrieve based on a language model:
Infer a LM for each document.
Estimate P(q|M
di).
Rank the documents according to these probabilities.
Multinomial coefficient for the query q.
Can be ignored.

11
Query Likelihood Model
Estimating the query generation probability:
Maximum Likelihood Estimation (MLE) + unigram LM
Limitations:
If we estimate P(t|Md)=0, documents will only give a query nonzero
probability if all of the query terms appear in the document.
Occurring words are poorly estimated, the probability of words
occurring once in the document is overestimated, because their
one occurrence was partly by chance.

12
Query Likelihood Model
Estimating the query generation probability:
Maximum Likelihood Estimation (MLE) + unigram LM
Smoothing:
Use the whole collection to smooth.
Linear Interpolation (Jelinek-Mercer Smoothing)
Bayesian Smoothing

13
Query Likelihood Model
Query likelihood model with linear interpolation:
Query likelihood model with Bayesian smoothing:
 


qt d
Cd
L
MtPMtP
dPqdP )
)|()|(
()()|(



14
Query Likelihood Model
Example using unigram + MLE + linear interpolation:
d1: Xyzzy reports a profit but revenue is down
d2: Quorus narrows quarter loss but revenue decreases further
λ=1/2
query: revenue down
ranking: d1>d2

15
Application
Community-based Question Answering (CQA) System:
Question Search.
Given a queried question, find a semantically equivalent question
for the queried question.
General Search Engine
Given a query, rank documents.

16
Questions?

17
Reference
Multinomial distribution:
http://en.wikipedia.org/wiki/Multinomial_distribution
Likelihood function:
http://en.wikipedia.org/wiki/Likelihood
Maximum likelihood:
http://en.wikipedia.org/wiki/Maximum_likelihood

languagemodel.ppt.Web Intelligence compu

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

languagemodel.ppt.Web Intelligence compu

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

8-top-ai-courses-for-customer-support-representatives-in-2025.pptx

7-essential-ai-courses-for-call-center-supervisors-in-2025.pptx

25-essential-ai-courses-for-user-support-specialists-in-2025.pptx

8-essential-ai-courses-for-insurance-customer-service-representatives-in-2025.pptx

Know for Certain

PPT OPD LES 3ertt4t4tqqqe23e3e3rq2qq232.pptx