Introduction to Natural language Processing

siddiquitanveer1 33 views 41 slides Mar 07, 2025
Slide 1
Slide 1 of 41
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41

About This Presentation

Introduction to Natural language processing


Slide Content

Introduction to Natural
Language Processing
Lecture 1 –NLP (Elective)
Tanveer J Siddiqui
J. K. Institute of Applied Physics
University of Allahabad

Objective of NLP ?
To build computational models of NL for its
analysis and generation.
Motivations:
2
Motivations:
Technological
Cognitive and Linguistic
tjs

Natural language processing originated from
machine translation research.
NLP vs. NLU
3
NLU involves interpretation of language
natural language processing includes both
understanding (interpretation) and generation
(production).
tjs

Tools
Grammar formalism
Algorithm and data structure
Formalism for representing world knowledge
Inherit results from AI, CS, Linguistics, logic
4
Inherit results from AI, CS, Linguistics, logic
and philosophy
tjs

Theoretical Linguists are interested in identifying
rules that capture linguistic generalization.
Psycholinguistics is interested in producing
theories that explains how human produce and
comprehend natural language.
5
CL study language from a computational point of
view.
-deals with application of linguistic theories and
computational techniques for natural language
processing.
tjs

computational models : knowledge-
driven’ and ‘data-driven’
6tjs

Goal of a Language ?
Serves communicative function
NLP focuses on the study of language as a
means of communication
7
means of communication

Communication requires a common language
& shared knowledge about the domain in
question.
tjs

Information Transfer
1.The speaker wants to convey some
information.
2.Decide what?
8
3.Decide how to code it in language?
-Utterance is the only thing actually received by
hearer, using which she gets the information
Hearer must extract it
How ? –by decoding
tjs

We can analyze various phenomenon in
language from the viewpoint of how they
code information
word order, case endings,..
9
word order, case endings,..
conflict -
tjs

Information-based approach
provides natural connection between
-syntax
-semantics &
10
-semantics &
-Pragmatics
Provide theory of communication at different
level integrate them to give a general theory
of communication
+ KR & its use
tjs

Problems in coding
Gender of the speaker is not coded in the
pronoun I’ or verb
11
Hearer is able to decode
There are several sources of knowledge that are
used in decoding the information from an
utterance.
tjs

Sources of Knowledge
Language Knowledge
Grammar
Lexicon
Pragmatics & Discourse
Background Knowledge
12
Background Knowledge
General World Knowledge (Common Sense)
Domain Specific
Context
Culture

Listener model…
tjs

Other factors?
Language does try to maintain regularity
across construction for
ease of acquisition
13
ease of coding(or decoding)
tjs

Where the Grammar fits in ?
System of rules that relates information to its
coding in language
(There is a Computational requirement)
Syntax ?
14
Syntax ?
When the system of rules relates information to
coding devices at the language level and not at
the world knowledge level, it is called syntax.
However, World Knowledge have strong
influences on coding
tjs

How World Knowledge influences
Coding ?
1.It influences fundamental coding convention
2.It also affects coding being used
15
Blurs the boundary between syntax and semantics
The separation is because of ease of processing &
grammar writing.
Syntax uses language coding devices
Semantics –
Anaphora…
tjs

Syntax will not be studied to identify an innate
autonomous level, but to relate it to semantics
& world knowledge to accomplish the overall
task of communication of information.
16
task of communication of information.
tjs

Natural language processing concerns the
development of computational models of aspects
of human language processing such as
-Reading and interpreting a textbook
17
-Reading and interpreting a textbook
-Writing a letter
-translating a document
-Searching for useful information
tjs

NLP is Interdisciplinary
Theoretical Linguistics
Computational Linguistics
Artificial Intelligence
18
Artificial Intelligence
This list is not exhaustive.
tjs

Theoretical Linguistics
Typical Questions
What is language ?
What is knowledge of language ?
How can it be finitely characterized ?
What linguistic forms are there ?
19
What linguistic forms are there ?
How linguistic forms constrain meaning ?
How is knowledge of language acquire given
limited exposure ?
Formal language theory
(Syntactic Structures, 1957 by Chomsky)
tjs

Theoretical Linguistics: Tools
and Methods
Empirical studies (study of frequencies,
conditional probabilities etc)
Formal Language Theory –Provides usable
definitions of grammatical knowledge
20
definitions of grammatical knowledge
Transformational Knowledge –for handling
identity of meaning between non-identical
sentences
tjs

Transformational Grammar
Chomsky’s Problem: Linguistic wanted to explain how
the sentences
Pooja plays veena.
Veena is played by Pooja.
have same meaning, despite having different surface
21
have same meaning, despite having different surface
structure (role of subject and object are inverted).
Chomsky’s Answer –
Both the sentences are being generated from the
same “deep Structure” in which the “deep subject”
is Pooja and “deep object” is Veena for both
sentences.
These consideration led to a model for NL grammar
that employs two levels of syntactic representation.
tjs

Deep and Surface structure
22
Pooja Plays Veena Veena is played by Pooja
Surface structure
tjs

Deep and Surface Structure
23
Pooja Plays Veena
Deep structure
tjs

Computational Linguistics
How can linguistic theory be made concrete
enough to test?
How can we represent grammatical and
lexical knowledge efficiently ?
24
lexical knowledge efficiently ?
Given a grammar and a lexicon, how is the
structure of the sentence actually identified?
What are the properties of particular grammar
formalisms?
tjs

Tools and Methods
Analysis and Generation algorithms
Grammar Formalism
25tjs

Artificial Intelligence
Role of language in Intelligent agent ?
How language function as a communicative
activity for shared problem solving ?
26
Tools and Methods
Knowledge Representation
Formal Reasoning
Agent Technology
tjs

Examples of NLP Applications
Text-based application
NL understanding
Dialogue Systems
27
Dialogue Systems
Multi-modal
tjs

Text-based Applications
Processing of written texts, such as books,
newspaper, reports etc.
-Finding appropriate documents on certain topics
from a text database
28
from a text database
-Extracting information from messages, articles,
web pages etc.
-translating documents from one language to
another
tjs

NL Understanding
Some applications require a deeper level
of analysis
29
The system must extract enough
information to determine whether the
articles meet the criterion defined by the
query.
How much understanding is necessary to
achieve the purpose?
tjs

Dialogue based Applications
Dialogue based Applications involve man-
machine communication
-NL database query system
30
-automated customer service
Difference from text-based Applications
-language used is less formal
-Use of acknowledgments, clarification sub
dialogues
tjs

Multimodal Applications
Involve two or more modalities of
communication
-text
-speech
31
-speech
-Gesture
-Image
Textspeech , speech text, multimodal
document generation, spoken translation
systems and spoken dialogue systems.
tjs

Successful NLP depends on…
Explicit criteria o success
Putting limits on the needs of outside knowledge
and experience
32
Availability of cheap computing power –no more
a problem
Encoding knowledge of how language works-
Still long way …. to go
tjs

Some NLP systems
Generating Weather report
Babel fish translation system –AV
Eurotra (MT fro European languages), MU for
33
Eurotra (MT fro European languages), MU for
Japanese and English, Anusarak among
Indian languages
tjs

ELIZA (Weizenbaum, 1966) -ELIZA is one of
the earliest natural language understanding
program. It uses syntactic patterns to mimic
human like conversation with the user. Here is a
sample conversation:
34
sample conversation:
Eliza : Hello. I am ELIZA. How can I help you?
You: I am feeling a little bit sleepy.
Eliza: How long have you been feeling a little bit
sleepy?
You: for almost half an hour
Eliza: Please go on.
tjs

History (J & M)
1940s and 1950s
Automaton and Probabilistic (Information
–theoretic)
35
Chomsky first defined a finite-state
language as a language generated by
finite state grammar. These early models
led to the field of formal language theory.
Second foundational insight was the
development of Probabilistic algorithms
(inspired from Shannon’s work)
tjs

History : 1957-1970
Two camps
-Symbolic
Two lines of research :
1. Inspired from Chomsky’s & others work
36
1. Inspired from Chomsky’s & others work
on formal language theory & the work of
linguistics & computer scientists on parsing
2. From AI (focus on reasoning and logic)
-Stochastic : statistics
Bayesian system for text recognition
First on-line corpora: the Brown Corpustjs

1970-1983
Four Paradigms:
Stochastic Paradigm
Logic-based paradigm
37
Logic-based paradigm
Natural Language Understanding
(SHRDLU-NLU, LUNAR-Q/A)
Discourse modeling
tjs

1983-1993
Return of
-finite state models
and -Empiricism
38
Which lost popularity in late 1950s
and early 1960s
tjs

Merging of fields
-Probabilistic and data-driven
39
-Probabilistic and data-driven
models had become standard
tjs

How Child learns language ?
All children are born with the ability to learn language( Noam
chomsky). He believed that all babies possess a "language
acquisition device." Children are born with the ability to
produce speech simply by hearing words and sentences
spoken by adults around them. (Vander Zanden)
40
spoken by adults around them. (Vander Zanden)
If this were the case they would not be able to create original,
unique sentences of their own. Instead, children listen to
adults speak and then form a rule system that they then apply
in other situations.
tjs

Innateness Hypothesis
Innateness Hypothesis holds that, to a large
extent, the organization of human language
(i.e. the "grammar") is innate, that is, inborn.
41tjs
Tags