This unit covers the structure of sentences and how syntax is processed in NLP. Students learn about Context-Free Grammars (CFG), grammar rules, and parsing techniques like top-down, bottom-up, CYK, and dynamic programming. It also discusses dependency grammar, ambiguity handling, probabilistic CFG,...
This unit covers the structure of sentences and how syntax is processed in NLP. Students learn about Context-Free Grammars (CFG), grammar rules, and parsing techniques like top-down, bottom-up, CYK, and dynamic programming. It also discusses dependency grammar, ambiguity handling, probabilistic CFG, and feature structures with unification.
Size: 525.94 KB
Language: en
Added: Oct 09, 2025
Slides: 3 pages
Slide Content
MOSIUOA WESI – ANDHRA UNIVERSITY – VISAKHAPATNAM 530001
Unit III: Syntactic Analysis
1. Introduction to Syntactic Analysis
Syntactic analysis, also known as parsing, is the process of analyzing a sentence’s
grammatical structure to determine how words combine to form valid phrases and
sentences. It involves defining grammar rules and using them to construct parse trees or
dependency graphs. This step is crucial for enabling computers to understand sentence
structures and is a foundation for higher-level language processing tasks.
2. Context-Free Grammars (CFGs)
A Context-Free Grammar (CFG) is a formal grammar used to describe the syntax of a
language. It consists of terminals, non-terminals, a start symbol, and production rules. CFGs
are widely used in NLP because they can generate recursive structures found in natural
languages.
Example Production Rules:
S → NP VP
NP → Det N
VP → V NP
Det → 'the'
N → 'dog'
V → 'chased'
3. Grammar Rules for English and Treebanks
Grammar rules for English define how words combine into phrases and clauses. These rules
can be used to build parse trees that represent sentence structure. Treebanks are annotated
corpora containing sentences with their syntactic parse trees. They serve as training and
evaluation data for parsers.
4. Normal Forms for Grammar
Normal forms such as Chomsky Normal Form (CNF) and Greibach Normal Form (GNF)
simplify CFGs to make parsing algorithms more efficient. CNF restricts production rules to
either A → BC or A → a, which helps algorithms like CYK parsing work systematically.
5. Dependency Grammar
Unlike phrase structure grammars, dependency grammar focuses on binary relations
between words in a sentence. Each word depends on another, forming a dependency tree
where edges represent grammatical relationships. This approach is efficient and widely
used in modern NLP applications like machine translation and information extraction.
6. Syntactic Parsing and Ambiguity
Syntactic parsing involves analyzing a sentence to produce its syntactic structure.
Ambiguity occurs when a sentence can have multiple valid parses. For example, 'I saw the
man with the telescope' can mean either I used a telescope or the man had a telescope.
Parsers must handle ambiguity efficiently, often using probabilistic methods to select the
most likely parse.
7. Parsing Methods
Parsing methods can be broadly classified into:
- Top-down parsing
- Bottom-up parsing
- Chart parsing
- Dynamic programming parsing
Dynamic programming parsing (such as CYK) stores intermediate results to avoid
redundant computations.
8. Probabilistic Parsing
Probabilistic parsing uses probabilities associated with grammar rules to select the most
likely parse tree. A Probabilistic CFG (PCFG) extends a CFG by assigning a probability to
each production rule.
The CYK algorithm is a bottom-up parsing algorithm that can be applied to CNF grammars,
and its probabilistic version (Probabilistic CYK) uses probabilities to rank parse trees.
9. Feature Structures and Unification
Feature structures are attribute-value pairs that capture additional information about
words and phrases, such as number, gender, or tense. Unification combines compatible
feature structures to ensure grammatical agreement in parsed structures.
10. Summary
Syntactic analysis is essential for understanding sentence structure in NLP. It involves
defining grammatical rules, constructing parse trees or dependency trees, and handling
ambiguity. Probabilistic methods and feature structures enhance parsing accuracy and
linguistic richness, forming a foundation for semantic interpretation.