Language Model (N-Gram).pptx

HeneWijaya 1,118 views 17 slides Dec 14, 2022

Slide 1 of 17

About This Presentation

hhiooio

Size: 140.13 KB

Language: en

Added: Dec 14, 2022

Slides: 17 pages

Slide Content

Language Model (N-Gram) Suriandi Jayakrista S.T.,M.T. Kalbis Institute

N-gram Model probabilistik N-gram , merupakan model yang digunakan untuk memprediksi kata berikutnya yang mungkin dari kata N-1 sebelumnya . Model statistika dari urutan kata ini seringkali disebut juga sebagai model bahasa ( language models / LMs). Model estimasi seperti N-gram memberikan probabilitas kemungkinan pada kata berikutnya yang mungkin dapat digunakan untuk melakukan kemungkinan penggabungan pada keseluruhan kalimat . Model N-gram merupakan model yang paling penting dalam pemrosesan suara ataupun bahasa baik untuk memperkirakan probabilitas kata berikutnya maupun keseluruhan sequence.

N-Grams ini akan mengambil sejumlah kata yang kita butuhkan dari sebuah kalimat . Seperti contohnya kalimat “Fahmi adalah anak yang rajin ”, maka bentuk N-Grams dari kalimat tersebut adalah : Unigram : Fahmi, adalah , anak , yang, rajin Bigram : Fahmi adalah , adalah anak , anak yang, yang rajin Trigram : Fahmi adalah anak , anak yang rajin dst .

What is an N-gram? An n -gram is a subsequence of n items from a given sequence . Unigram: n -gram of size 1 Bigram: n -gram of size 2 Trigram: n -gram of size 3 Item: Phonemes Syllables Letters Words Anything else depending on the application.

Example the dog smelled like a skunk Bigram: "# the”, “the dog”, “dog smelled”, “ smelled like”, “like a”, “a skunk”, “skunk#” Trigram: "# the dog", "the dog smelled", "dog smelled like", "smelled like a", "like a skunk" and "a skunk #".

How to predict the next word? Assume a language has T word types in its lexicon, how likely is word x to follow word y ? Solutions: Estimate likelihood of x occurring in new text, based on its general frequency of occurrence estimated from a corpus popcorn is more likely to occur than unicorn Condition the likelihood of x occurring in the context of previous words ( bigrams , trigrams ,…) mythical unicorn is more likely than mythical popcorn

Statistical View The task of predicting the next word can be stated as: attempting to estimate the probability function P: In other words: we use a classification of the previous history words, to predict the next word. On the basis of having looked at a lot of text, we know which words tend to follow other words.

N-Gram Models Models Sequences using the Statistical Properties of N-Grams Idea: Shannon given a sequence of letters, what is the likelihood of the next letter? From training data, derive a probability distribution for the next letter given a history of size n.

Markov assumption : the probability of the next word depends only on the previous k words. Common N-grams:

Using N-Grams For N-gram models P(w 1 ,w 2 , ...,w n ) By the Chain Rule we can decompose a joint probability, e.g. P(w 1 ,w 2 ,w 3 ) as follows: P(w 1 ,w 2 , ...,w n ) = P(w 1 |w 2 ,w 3 ,...,w n ) P(w 2 |w 3 , ...,w n ) … P(w n-1 |w n ) P(w n )

Example Compute the product of component conditional probabilities? P( the mythical unicorn ) = P( the ) * P( mythical | the ) * P( unicorn | the mythical ) How to calculate these probability?

Training and Testing N-Gram probabilities come from a training corpus overly narrow corpus: probabilities don't generalize overly general corpus: probabilities don't reflect task or domain A separate test corpus is used to evaluate the model

A Simple Bigram Example Estimate the likelihood of the sentence: I want to eat Chinese food. P(I want to eat Chinese food ) = P( I | <start>) P( want | I ) P( to | want ) P( eat | to ) P( Chinese | eat ) P( food | Chinese ) P(<end>| food ) What do we need to calculate these likelihoods? Bigram probabilities for each word pair sequence in the sentence Calculated from a large corpus

Early Bigram Probabilities from BERP The Berkeley Restaurant Project (BeRP) is a testbed for a speech recognition system is being developed by the International Computer Science Institute in Berkeley, CA

References https://web.cs.hacettepe.edu.tr/~ilyas/Courses/CMP711/lec03-LanguageModels.pdf

Language Model (N-Gram).pptx

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Language Model (N-Gram).pptx

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Pray For The Peace Of Jerusalem and You Will Prosper

Don_t_Waste_Your_Life_God.....powerpoint

VILLASUR_FACTORS_TO_CONSIDER_IN_PLATING_SALAD_10-13.pdf

Fertility awareness methods for women in the society

Chapter 5 Arithmetic Functions Computer Organisation and Architecture

syakira bhasa inggris (1) (1).pptx.......