MOSIUOA WESI NLP UNIT 5 THIRD SEMESTER AUCE

MosiuoaWesi 1 views 3 slides Oct 09, 2025
Slide 1
Slide 1 of 3
Slide 1
1
Slide 2
2
Slide 3
3

About This Presentation

This unit explores how language is used beyond individual sentences. It covers discourse structure, coherence, cohesion, dialogue systems, conversational agents, and speech acts. Students learn how machines understand and manage conversations, resolve anaphora, and maintain context in dialogue.


Slide Content

MOSIUOA WESI – ANDHRA UNIVERSITY – VISAKHAPATNAM 530001

Unit V: Discourse Analysis and Lexical
Resources
1. Introduction to Discourse Analysis and Lexical Resources
Discourse analysis focuses on understanding language beyond the sentence level,
considering how sequences of sentences form coherent texts and conversations. Unlike
syntax and semantics that analyze single sentences, discourse analysis examines context,
reference, coherence, and structure. Lexical resources, on the other hand, provide
structured information about words, their meanings, and relations, which are essential for
various NLP tasks.
2. Discourse Segmentation and Coherence
Discourse segmentation involves dividing text into meaningful units such as sentences,
clauses, or discourse segments. Coherence refers to how these segments are logically
connected to convey a unified message. A coherent text allows readers or machines to
understand relationships between different parts of the discourse.
3. Reference Phenomena and Anaphora Resolution
Reference phenomena occur when certain expressions refer to previously mentioned
entities in the discourse. Anaphora resolution is the task of identifying what a pronoun or
referring expression points to.
For example: 'John entered the room. He sat down.' — 'He' refers to 'John'.

Techniques include:
- Hobbs algorithm
- Centering theory
- Machine learning approaches
4. Coreference Resolution
Coreference resolution identifies all expressions that refer to the same entity. Unlike
anaphora resolution which typically deals with pronouns, coreference resolution also
considers noun phrases and other referring expressions.

It plays a key role in tasks like information extraction, summarization, and dialogue
systems.
5. Lexical Resources for NLP
Lexical resources are structured databases that store linguistic information. They are
essential for tasks such as part-of-speech tagging, parsing, WSD, and machine translation.
Common lexical resources include:
- Porter Stemmer: for stemming words
- Lemmatizer: for converting words to their base form
- Penn Treebank: annotated corpus with syntactic information
- Brill’s Tagger: POS tagging tool
- WordNet: lexical database for English
- PropBank and FrameNet: for semantic roles and frames
- Brown Corpus and BNC (British National Corpus): large text corpora
6. Porter Stemmer and Lemmatizer
The Porter Stemmer is a popular algorithm that removes common morphological endings
from words, reducing them to their stems. Lemmatization, on the other hand, uses
vocabulary and morphological analysis to return the base or dictionary form of a word. For
example, 'running' → 'run' (lemma). Lemmatization is generally more accurate than
stemming.
7. Discourse Parsing and Understanding
Discourse parsing involves identifying relationships between discourse segments, such as
cause-effect, elaboration, contrast, and temporal sequence. This process helps build
structured representations like discourse trees or graphs, enabling deeper understanding of
text and conversations.
8. Algorithms and Techniques for Discourse Analysis
Key algorithms and techniques used include:
- Hobbs algorithm for anaphora resolution
- Centering theory for discourse coherence
- Discourse relation parsers
- Neural coreference resolution models
- Graph-based approaches for discourse structure
9. Applications of Discourse Analysis and Lexical Resources
These concepts are applied in:
- Information extraction and retrieval

- Text summarization
- Machine translation
- Conversational AI and chatbots
- Sentiment analysis and opinion mining
10. Summary
Discourse analysis and lexical resources are crucial for understanding language in context.
While discourse analysis helps machines interpret meaning across sentences, lexical
resources provide the foundation for semantic and syntactic understanding. Together, they
enable robust NLP applications that go beyond isolated sentences to process entire texts
and dialogues effectively.