UNIT 1 D.pptx natural language processing

saurabhtiwarig21 0 views 40 slides Oct 09, 2025
Slide 1
Slide 1 of 40
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40

About This Presentation

natural language processing


Slide Content

What is Interpolation in NLP?

Basics about interpolation in NLP Interpolation means  blending information from different sources  to make a better estimate or guess. In NLP, it usually means mixing probabilities from several language models like unigram, bigram, and trigram instead of using just one.

Simple Explanation Suppose we want to guess the next word in a sentence. Don’t just use one way to guess (like bigram only or trigram only!). Instead, take a bit from each model and  combine  them for a more reliable answer.

Why Use Interpolation? Because relying on just one model or bit of data may not be strong enough or may miss important clues. Mixing several levels of information helps in getting a more accurate and trustworthy result—especially when data is limited or incomplete.

Real-Life Analogy Imagine a student’s grade is calculated from both Math and Science marks. Instead of using only Math or only Science, combine both with a certain weight (like 40% Math + 60% Science). Interpolation does the same i.e. combining information from different sources.

NLP Example Example 1: To fill a blank: “The weather today is ___.” Unigram model: says "hot" is common. Bigram: “very hot” is common. Trigram : “ really extreme hot” is common. i.e. Instead of relying on just one, interpolation takes all of them together (like 0.2 × unigram + 0.3 × bigram + 0.5 × trigram) to guess the best word.

NLP Example Example 2 : Suppose we have two thermometers, one old and one new. Now, Instead of trusting only one, we use both readings in a certain mix (may be 40% old + 60% new) to get the best estimate of the real temperature.

Key points Interpolation = Smart Mixing. Take a little knowledge from each ‘source’ and blend them for a better prediction or answer. This approach is widely used in NLP for estimating word probabilities, language modeling, and dealing with sparse data.

What is sparse data? Sparse data means  mostly empty or zero values . In NLP, it happens because most word combinations are rare or never appear.

Why Sparse Data Happens in NLP? Language has a huge number of possible word pairs or groups. But in real sentences, only a few word combinations actually occur. So, the big table of all possible word pairs is mostly empty.

Part of Speech (POS) Tagging in NLP What is POS Tagging? POS Tagging means assigning a  label (tag)  to each word in a sentence. The label shows the word’s  grammatical role : noun, verb, adjective, etc. Helps the machine understand sentence structure.

Why is POS Tagging Important? Helps computers understand  meaning  and  grammar  of text. Useful in tasks like translation, sentiment analysis , and i nformation extraction. Helps distinguish different meanings of the same word based on context.

POS: Simple Example Sentence:  The quick brown fox jumps over the lazy dog. POS Tags: The (Determiner) quick (Adjective) brown (Adjective) fox (Noun) jumps (Verb) over (Preposition) the (Determiner) lazy (Adjective) dog (Noun)

How POS Tagging Works? Step 1: Break sentence into words (tokenization). Step 2: Assign each word a tag based on dictionaries or machine learning. Step 3: Use the tags to understand sentence meaning.

POS TAG: KEY POINTS POS Tagging tells us  what each word does  in the sentence. It’s a basic, crucial step for many NLP applications.

Stochastic Tagging and Transformation-Based Tagging in NLP  We know that POS Tagging gives  each word a tag  showing its role (noun, verb, adjective, etc.) and hence h elps computers understand sentences. Now, - Stochastic => Probability-based tagging . - The model learns from a lot of examples how likely a word has a certain tag. - It guesses tags based on the  chance (probability)  of words and tag sequences. Example: The word "play" is mostly a verb (I play cricket), sometimes a noun (a play).

Stochastic Tagging Example Sentence: "I want to play." Model sees that "play" is often a  verb  after "to", so it tags "play" as a verb here. - It uses statistics from previous texts to decide.

What is Transformation-Based Tagging (Brill Tagging)? It Starts with a simple guess for each word’s tag . Then uses  rules learned from data  to correct the tags step by step. Combines  rule-based  and  machine learning  ideas.

Transformation-Based Tagging Example Initial tags: "The (Det) cooking (Verb) is (Verb) good (Adj)." Rule: If a word ending with " ing " comes after "The," change it from verb to noun. Corrected tags: "The (Det) cooking (Noun) is (Verb) good (Adj)." Where, Det means Determinant

Summary Aspect Stochastic Tagging Transformation-Based Tagging Approach Uses probabilities/statistics Starts with initial tags, improves with rules How it learns From large labeled data Learns rules from data Best for When lots of data and patterns exist When combining rules and patterns Example Choosing "play" as verb based on chance Changing "cooking" from verb to noun by rule Summary

Issues in POS Tagging Issue 1 – Ambiguity Words can have  multiple meanings  or tags. Example: "Book" can be a noun (a book) or a verb (to book a seat). Context decides the correct tag, but machines sometimes confuse.

Issue 2 – Unknown Words (Out-of-Vocabulary Words) Words that didn’t appear in training data cause problems. Example: New slang or names like "Zoomer" may be wrongly tagged.

Issue 3 – Idiomatic Expressions Phrases with special meanings are hard to tag. Example: "Kick the bucket" means “to die,” but word-by-word tags confuse meaning.

Issue 4 – Domain Dependence Models trained on one type of text (news, books) may fail on others (medical, tweets). Words behave differently in different domains.

Issue 5 – Data Sparsity Insufficient examples for rare words or tags. Models struggle to tag correctly for rare cases.

Summary of issues in POS Tagging ssue What Happens? Example Ambiguity Multiple meanings confuse tagging "Book" (noun or verb) Unknown Words Not seen in training "Zoomer" (slang) Idiomatic Phrases Meaning differs from parts "Kick the bucket" Domain Differences Models fail outside trained domain Medical vs. news text Data Sparsity Rare words/tags hard to tag Uncommon words in corpus

Example: Issues in POS Tagging Example: Real-Life Example: Word "Bat“ "Bat" can mean: a flying animal (noun) a sports equipment (noun) to hit (verb)

Issue in POS Tagging Hidden Markov Model(HMM) in NLP -HMM is a probabilistic model that guesses the most likely sequence of POS tags. - Hidden Markov Models (HMM) face difficulty with new/unseen activities or words because they rely on learned probabilities . -This is called the "unknown observation" problem in HMM. -It looks at the previous tag to predict the current one (Markov assumption). -Uses probabilities from training data.

Main Issues of HMM in POS Tagging Issue 1 – Unknown Words Issue 2 – Data Sparsity Issue 3 – Limited Context Issue 4 – Ambiguity

Issue 1 – Unknown Words HMM struggles to tag words  not seen  in training data. Example: New slang, names, or technical terms confuse the model.

Issue 2 – Data Sparsity -Some word-tag or tag-tag combinations are  rare  or missing in training data. -Leads to zero or wrong probabilities, causing tagging errors.

Issue 3 – Limited Context - HMM only looks at the  previous tag  to decide current tag. - Ignores long-distance dependencies or wider sentence context.

Issue 4 – Ambiguity Words with  multiple possible tags  (like "book" as noun/verb) confuse the model. If probabilities are close, wrong tag might be assigned.

Real-Life Example: Guessing Weather by Actions Imagine guessing weather (hidden states) by watching activities (observations): Someone carrying umbrella → likely raining. But what if a new activity never seen before appears? HMM struggles with unknown activities (unknown words) and only considers yesterday’s weather (previous tag), not longer history.

Real-Life Example: Guessing Weather by Actions One day, my friend says they went " jogging ," an activity we never heard before. -Since we never saw "jogging" in the past, we don’t know how it relates to the weather! - Our guess about the weather becomes  uncertain  because the model has no data about this new activity.

Maximum Entropy Models ( MaxEnt ) in POS Tagging in NLP MaxEnt is a  probability model  that predicts POS tags. Uses  many features  (word, suffix, surrounding words) to decide the best tag. -Works on the idea of  "maximum entropy"  = pick the model that makes fewest assumptions while fitting data. - Least assumptions and more clues model suits more.

How MaxEnt Works? Looks at all the information around a word, not just previous tags. Combines clues like: The word itself, Prefix or suffix (like “- ing ”), Previous and next words, Capitalization, etc. Weights all clues to calculate the probability of each possible tag.

MaxEnt : Real-Life Example of Job Interview Decision Imagine a recruiter deciding to hire a candidate. They use many features: experience, skills, education, interview performance. - MaxEnt is like the recruiter—using many clues together to make the best decision.

Why MaxEnt is Useful? Because can handle  complex and rich information . Doesn’t need the "previous tag only" assumption like HMM. Better at tagging ambiguous or unknown words because it uses multiple clues.

Issues with MaxEnt Model -Takes longer to train. -Needs good choice of features. -Can be computationally heavy. - Slow training, needs careful feature design
Tags