natural language processing unit-3 ppt

Hshhdvrjdnkddb 7 views 35 slides Mar 10, 2025
Slide 1
Slide 1 of 35
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35

About This Presentation

nlp unit 3


Slide Content

UNIT-III Prepared By R.Ch.S.N.P.SaiRa m

Syllabus Introduction to phrases, cleaning text data, Shallow Parsing and Chunking, Shallow Parsing with Conditional Random Fields (CRF), Lexical Semantics, Word Sense Disambiguation, WordNet .

cleaning text data Before we can analyze text using Natural Language Processing (NLP) , we need to clean the data. This process is called text preprocessing and helps improve the accuracy of NLP models.

Why is Cleaning Text Data Important? When we work with text in NLP, the data is often messy. There can be: Capitalization differences (e.g., "Hello" vs. "hello"). Punctuation marks (e.g., "hello!" vs. "hello"). Extra spaces and special characters (e.g., "@ hello_world "). Different word forms (e.g., "running" vs. "run"). Unnecessary words (e.g., "the", "is", "and") If we don’t clean the text properly, the machine learning model or NLP algorithm may not work well. Cleaning helps improve accuracy and efficiency . Now, let's go through each step with detailed explanations.

Here are the most important steps: 1. Lowercasing Convert all letters to lowercase so that words like "Hello" and "hello" are treated the same. Example: Before: "Hello WORLD!" After: "hello world!" 2. Removing Punctuation & Special Characters Punctuation marks (like periods, commas, exclamation marks, etc.) are symbols that are not usually needed when analyzing text. Punctuation (.,!?@#) and special characters don’t add much meaning, so we remove them. Example: Before: "Hello, world!! How's it going?" After: "Hello world Hows it going"

Cont.. 3. Tokenization (Splitting text into words or sentences) Break text into smaller parts (words or sentences) so that we can analyze them separately. Example: Sentence Tokenization: Before: "Hello world. NLP is fun!" After: ["Hello world.", "NLP is fun!"] Word Tokenization: Before: "Hello world" After: ["Hello", "world"] 4. Removing Stopwords Stopwords are common words like is, the, and, in, at, etc. , which don’t add much meaning to the sentence. Example: Before: "The cat is on the table." After: "cat table"

Cont.. 5. Stemming & Lemmatization Reduce words to their root form to make analysis easier. Stemming (Removes suffixes, sometimes roughly): Example: Before: "playing, played, plays" After: "play" Lemmatization (Gives the correct base word): Example: Before: "better" After: "good"

Why Do We Clean Text in NLP? Improves accuracy in machine learning models. Reduces storage and processing time. Helps in search engines, chat bots, and translation systems. Makes text analysis and sentiment detection more efficient

import spacy # Load spaCy's English mode l nlp = spacy.load (" en_core_web_sm ") # Sample text text = "Hello!!! NLP is exciting. It helps computers understand human language." # Process text using spaCy (convert to lowercase) doc = nlp ( text.lower ()) # Remove stopwords and punctuation, keep only useful words (lemmas) cleaned_tokens = [ token.lemma _ for token in doc if not token.is_stop and not token.is_punct ] # Print results print("Original Text:", text) print("Cleaned Text:", " ".join( cleaned_tokens ))

Introduction to phrases A phrase is a group of words that work together as a single unit in a sentence. Unlike a full sentence, a phrase does not have both a subject and a verb . Instead, it helps to add meaning to a sentence. Think of a phrase like a small puzzle piece that helps complete the bigger picture (the sentence).

Why Are Phrases Important in NLP? In Natural Language Processing (NLP), understanding phrases helps in: Breaking down sentences into meaningful parts. Extracting information (e.g., identifying names, places, and actions). Improving machine translation and text analysis .

Shallow Parsing and Chunking Shallow parsing , also known as chunking , is a technique used in Natural Language Processing (NLP) to group words in a sentence into meaningful chunks (or phrases ) without analyzing the deep structure of the sentence. These chunks usually represent noun phrases (NPs) , verb phrases (VPs) , prepositional phrases (PPs) , etc.It's called "shallow" because it's not looking into the complete structure of the sentence (like deep parsing does), just identifying the major chunks.

Why is Chunking Useful? It helps break down the sentence into smaller pieces that are easier to understand. For example, chunking can help identify key parts of a sentence, like who is doing the action and what is happening .

Types of Phrases Noun Phrase (NP) – A group of words that act as a noun Example: "The little cat" (acts as a noun) "A big red balloon" (acts as a noun) Verb Phrase (VP) – A group of words that include the main verb and sometimes helping verbs. Example: "is running fast" "has been studying all day" Prepositional Phrase (PP) – A phrase that starts with a preposition and gives extra details. Example: "on the table" "under the big tree"

Example Sentence: "The quick brown fox jumps over the lazy dog." Noun Phrase (NP) : A group of words that works as a noun. It could be a person, place, thing, or idea. Example : "The quick brown fox" → This tells us what the sentence is talking about (the subject). Verb Phrase (VP) : A group of words that contains a verb (action or state) and its related components. Example : "jumps" → This tells us what the subject (fox) is doing. Prepositional Phrase (PP) : A group of words that begins with a preposition (e.g., in, on, over, under) and gives more information about something in the sentence. Example : "over the lazy dog" → This tells us where the fox is jumping.

Why Should You Care About Chunking? Makes processing easier : Instead of understanding a sentence word by word, you can look at the bigger chunks (like nouns and verbs) to get a clearer idea of what's going on. Helps with many NLP tasks : It's useful in things like information extraction , machine translation , and speech recognition .

import spacy # Load the English NLP model nlp = spacy.load (" en_core_web_sm ") # Sample sentence sentence = "The quick brown fox jumps over the lazy dog." # Process the sentence using spaCy doc = nlp (sentence) # Extract noun phrases (NP) and verb phrases (VP) print("Noun Phrases:") for chunk in doc.noun_chunks : print(f"- { chunk.text }") print("\ nVerb Phrases:") for token in doc: if token.pos _ == "VERB": print(f"- { token.text }")

Cont.. spacy.load (" en_core_web_sm ") loads a small-sized English NLP model ( en_core_web_sm ). This model contains: Tokenization (splitting text into words) Part-of-Speech (POS) Tagging Named Entity Recognition (NER) Dependency Parsing Lemmatization doc.noun_chunks extracts noun phrases (NP) . Checking token.pos _ == "VERB" extracts verb phrases (VP) . The output shows the extracted phrases. pip install spacy python -m spacy download en_core_web_sm

Shallow Parsing with Conditional Random Fields (CRF), Shallow parsing, also known as chunking , is a technique in Natural Language Processing (NLP) where we identify phrases (or "chunks") in a sentence instead of analyzing full grammar. For example, in the sentence: "John is going to New York." Shallow parsing might identify: "John" → (Noun Phrase - NP) "is going" → (Verb Phrase - VP) "to New York" → (Prepositional Phrase - PP) Unlike full parsing, which builds a complete syntax tree, shallow parsing only finds meaningful chunks without deep structure.

What is CRF (Conditional Random Fields)? CRF is a machine learning model used for sequence labeling tasks, meaning it looks at patterns across sequences of words. Imagine you want to classify each word in a sentence as: Person (e.g., "John") Location (e.g., "New York") Other (e.g., "is", "going", "to") CRFs work by considering the context of words rather than labeling each word independently. For example: In the sentence "John went to New York" , CRF understands that "New" and "York" together are likely a location instead of separate words. Similarly, it recognizes patterns —like a verb usually follows a noun.

Why use CRF for Shallow Parsing? Understands Context – Unlike simple models, CRFs consider nearby words when labeling text. Improves Accuracy – Helps in chunking , POS tagging , and Named Entity Recognition (NER) . More Reliable – Unlike Naïve Bayes or Hidden Markov Models (HMM), it avoids making incorrect independence assumptions about words.

How Does CRF Work? (Step-by-Step) Let’s break down how CRF helps in shallow parsing: Step 1: Feature Extraction CRF first extracts features from words. Features could be: The word itself (e.g., " Sachin ") The previous word (e.g., "Mr.") The next word (e.g., "plays") Part of speech (e.g., "Noun") Step 2: Context Awareness CRF looks at the relationships between words instead of just classifying each word separately. Step 3: Assigning Labels After training on a large dataset, CRF assigns labels based on patterns it has learned.

Example For a sentence like: " Barack Obama visited London." You can use CRFs to label the words as: " Barack " -> Person " Obama " -> Person "visited" -> Verb "London" -> Location Here, CRFs learn from the context in which the words appear, and use that to correctly label the named entities (e.g., Person , Location ).

Real-World Applications of CRF Chatbots – Identifying names, dates, and locations in user queries. Search Engines – Understanding search queries better. Spam Detection – Identifying spam messages based on patterns. Medical NLP – Extracting patient symptoms and diseases from medical records.

Lexical Semantics in NLP Lexical semantics is a subfield of linguistics and natural language processing (NLP) that focuses on the meanings of words, their relationships, and how they combine to form meaningful expressions. It plays a crucial role in various NLP tasks such as machine translation, sentiment analysis, and information retrieval.

What is Lexical Semantics? "Lexical" means related to words . "Semantics" means related to meaning . Lexical Semantics is the study of word meanings, synonyms, antonyms, and how words relate to each other.

Cont.. Lexical Semantics deals with the meaning of words and their relationships with other words in a language. Since words can have multiple meanings based on context, NLP systems must analyze these meanings to understand text correctly. Example: The word "light" can mean: Brightness – The room is full of light. Not heavy – This bag is very light. To avoid confusion, Lexical Semantics helps NLP systems determine the correct meaning based on context.

Why is Lexical Semantics Important in NLP? Lexical semantics helps NLP systems process language accurately by understanding word meanings, relationships, and contexts. Applications in NLP: Chatbots & Virtual Assistants ( Alexa , Siri , Google Assistant) Help in understanding different meanings of words Improve responses to user queries Search Engines (Google, Bing, Yahoo) Understand synonyms to provide better search results Example: Searching "low-cost flights" also shows "cheap flights"

Cont.. Machine Translation (Google Translate, DeepL ) Avoid incorrect translations by considering word context Example: "He is a bright student." Correct Translation (Spanish): " Él es un estudiante inteligente ." Incorrect Translation: " Él es un estudiante brillante ." (Brilliant = Shiny, not smart) Sentiment Analysis (Customer Reviews, Social Media) Detects positive and negative words in a review Example: "The movie was awful !" (Negative sentiment 😞)

Key Concepts in Lexical Semantics A. Word Meaning Each word has a meaning that depends on: The context in which it is used The other words around it Example: She wore a beautiful ring on her finger. ( Ring = Jewelry 💍 ) The phone started to ring loudly. ( Ring = Sound 🔔 ) Since the same word has different meanings, NLP systems need Word Sense Disambiguation (WSD) to understand them correctly.

Cont.. B. Word Relationships Synonyms (Words with Similar Meanings) Example: Happy 😊 = Joyful = Cheerful = Glad NLP applications use synonyms to improve search results and recommendations. Example: A search for "cheap hotels" can also show results for "affordable hotels" . Antonyms (Words with Opposite Meanings) Example: Hot 🔥 vs. Cold ❄️ Sentiment analysis tools use antonyms to understand positive and negative emotions in texts.

cont,… Homonyms (Words that Sound/Look the Same but Have Different Meanings) Example: "Can you pass the bat ?" (Cricket bat 🏏) "A bat is flying in the sky!" (Animal 🦇) NLP systems must recognize the correct meaning by analyzing the context . Hyponymy & Hypernymy (Word Hierarchy Relationships) Hyponym (Specific Word) → Rose is a hyponym of Flower 🌹 Hypernym (General Category) → Flower is a hypernym of Rose, Tulip, Sunflower 🌻

How Do NLP Models Use Lexical Semantics? NLP models use Machine Learning and Deep Learning to process lexical semantics. Here are some methods: A. Word Embeddings (Vector Representations of Words) Converts words into numerical representations that capture meaning Popular techniques: Word2Vec, GloVe , BERT Example: King - Man + Woman = Queen (Semantic Relationship) B. Knowledge Graphs (Word Relationship Networks) WordNet is a lexical database that connects words through synonyms, antonyms, and hypernyms . C. Contextual Understanding (Transformers & Deep Learning) BERT (Bidirectional Encoder Representations from Transformers) helps NLP models understand words in context.

from nltk.corpus import wordnet word = "car" synsets = wordnet.synsets (word) # Get synsets (meanings) of the word if synsets : first_synset = synsets [0] # Choose the first meaning synonyms = {lemma.name() for syn in synsets for lemma in syn.lemmas ()} # Synonyms antonyms = { lemma.antonyms ()[0].name() for syn in synsets for lemma in syn.lemmas () if lemma.antonyms ()} # Antonyms hypernyms = {hypernym.name() for hypernym in first_synset.hypernyms ()} # General category hyponyms = {hyponym.name() for hyponym in first_synset.hyponyms ()} # Specific types print("Synonyms:", synonyms) print("Antonyms:", antonyms) print(" Hypernyms (General category):", hypernyms ) print("Hyponyms (Specific types):", hyponyms) else: print("No meanings found for the word.")

Applications of Lexical Semantics in NLP Chatbots & Virtual Assistants – Helps understand user queries (e.g., Alexa , Google Assistant). Search Engines – Improves search results using synonyms and word meanings. Machine Translation – Enhances translation accuracy (e.g., Google Translate). Sentiment Analysis – Understands emotions in text (e.g., reviews, social media).
Tags