Natural language processing (Python)

sumit786raj 6,173 views 29 slides Mar 20, 2013

Slide 1 of 29

About This Presentation

It's a brief overview of Natural Language Processing using Python module NLTK.The codes for demonstration can be found from the github link given in the references slide.

Size: 259.08 KB

Language: en

Added: Mar 20, 2013

Slides: 29 pages

Slide Content

Natural Language Processing
Using Python
Presented by:-
Sumit Kumar Raj
1DS09IS082
ISE,DSCE-2013

Table of Contents
•
Introduction
•
History
•
Methods in NLP
•
Natural Language Toolkit
•
Sample Codes
•
Feeling Lonely ?
•
Building a Spam Filter
•
Applications
•
References
ISE,DSCE-2013 1

l

What is Natural Language Processing ?
•Computer aided text analysis of human language.
•The goal is to enable machines to understand human
language and extract meaning from text.
•It is a field of study which falls under the category of
machine learning and more specifically computational
linguistics.
ISE,DSCE-2013 2

l

History
•
1948- 1st NLP application
– dictionary look-up system
– developed at Birkbeck College, London
•
1949- American interest
–WWII code breaker Warren Weaver
– He viewed German as English in code.
•
1966- Over-promised under-delivered
– Machine Translation worked only word by word
l
– NLP brought the first hostility of research funding
l
– NLP gave AI a bad name before AI had a name.
ISE,DSCE-2013 3

Search engines
Site recommendations
Spam filtering
Knowledge bases and
expert systems
Automated customer
support systems
Sentiment analysis
Consumer behavior analysis
Natural language processing is heavily used throughout all web
technologies
ISE,DSCE-2013 4

Context
Little sister: What’s your name?
Me: Uhh….Sumit..?
Sister: Can you spell it?
Me: yes. S-U-M-I-T…..
ISE,DSCE-2013 5

Sister: WRONG! It’s spelled “I-
T”
ISE,DSCE-2013 6

Ambiguity
“I shot the man with ice cream.“
-A man with ice cream was shot
-A man had ice cream shot at him
ISE,DSCE-2013 7

Methods :-
1) POS Tagging :-
•In corpus linguistics, Parts-of-speech tagging also called
grammatical tagging or word-category disambiguation.
•It is the process of marking up a word in a text corres-
ponding to a particular POS.
•POS tagging is harder than just having a list of words
and their parts of speech.
•Consider the example:
l
The sailor dogs the barmaid.
ISE,DSCE-2013 8

2) Parsing :-
•In context of NLP, parsing may be defined as the process of
assigning structural descriptions to sequences of words in
a natural language.
Applications of parsing include
simple phrase finding, eg. for proper name recognition
Full semantic analysis of text, e.g. information extraction or
machine translation
ISE,DSCE-2013 9

3) Speech Recognition:-
•It is concerned with the mapping a continuous speech signal
into a sequence of recognized words.
•Problem is variation in pronunciation, homonyms.
•In sentence “the boy eats”, a bi-gram model sufficient to
model the relationship b/w boy and eats.
“The boy on the hill by the lake in our town…eats”
•Bi-gram and Trigram have proven extremely effective in
obvious dependencies.
ISE,DSCE-2013
10

4) Machine Translation:-
•It involves translating text from one NL to another.
•Approaches:-
-simple word substitution,with some changes in ordering to
account for grammatical differences
-translate the source language into underlying meaning
representation or interlingua
ISE,DSCE-2013 11

5) Stemming:-
•In linguistic morphology and information retrieval, stemming is
the process for reducing inflected words to their stem.
•The stem need not be identical to the morphological root of the
word.
•Many search engines treat words with same stem as synonyms
as a kind of query broadening, a process called conflation.
ISE,DSCE-2013 12

• NLTK is a leading platform for building Python program to
work with human language data.
• Provides a suite of text processing libraries for
classification, tokenization, stemming, tagging, parsing,
and semantic reasoning.

•Currently only available for Python 2.5 – 2.6
http://www.nltk.org/download
•`easy_install nltk
•Prerequisites
–NumPy
–SciPy
Natural Language Toolkit
ISE,DSCE-2013 13

Let’s dive into some code!
ISE,DSCE-2013 14

Part of Speech Tagging
from nltk import pos_tag,word_tokenize
sentence1 = 'this is a demo that will show you how
to detects parts of speech with little effort
using NLTK!'
tokenized_sent = word_tokenize(sentence1)
print pos_tag(tokenized_sent)
[('this', 'DT'), ('is', 'VBZ'), ('a', 'DT'), ('demo', 'NN'), ('that', 'WDT'),
('will', 'MD'), ('show', 'VB'), ('you', 'PRP'), ('how', 'WRB'), ('to', 'TO'),
('detects', 'NNS'), ('parts', 'NNS'), ('of', 'IN'), ('speech', 'NN'), ('with',
'IN'), ('little', 'JJ'), ('effort', 'NN'), ('using', 'VBG'), ('NLTK', 'NNP'),('!',
'.')]
ISE,DSCE-2013 15

Fun things to Try
ISE,DSCE-2013 16

Eliza is there to talk to you all day! What human could ever do that
for you??
Feeling lonely?
from nltk.chat import eliza
eliza.eliza_chat()
Therapist
---------
Talk to the program by typing in plain English, using normal upper-
and lower-case letters and punctuation. Enter "quit" when done.
============================================================
============
Hello. How are you feeling today?
……starts the chatbot
ISE,DSCE-2013 17

Let’s build something even
cooler
ISE,DSCE-2013 18

Lets write a Spam filter!
A program that analyzes legitimate emails “Ham” as well as
“Spam” and learns the features that are associated with
each.
Once trained, we should be able to run this program on
incoming mail and have it reliably label each one with the
appropriate category.
ISE,DSCE-2013 19

1.Extract one of the archives from the site into your working directory.
2.Create a python script, lets call it “spambot.py”.
3.Your working directory should contain the “spambot” script and the
folders “spam” and “ham”.
from nltk import word_tokenize,\
WordNetLemmatizer,NaiveBayesClassifier\
,classify,MaxentClassifier
from nltk.corpus import stopwords
import random
import os, glob,re
ISE,DSCE-2013 20
“Spambot.py” (continued)

mixedemails = ([(email,'spam') for email in spamtexts]
mixedemails += [(email,'ham') for email in hamtexts])
random.shuffle(mixedemails)
From this list of random but labeled emails, we will defined a “feature
extractor” which outputs a feature set that our program can use to statistically
compare spam and ham.
label each item with the appropriate label and store them as a list of tuples
lets give them a nice shuffle
“Spambot.py” (continued)
ISE,DSCE-2013 21

def email_features(sent):
features = {}
wordtokens = [wordlemmatizer.lemmatize(word.lower()) for
word in word_tokenize(sent)]
for word in wordtokens:
if word not in commonwords:
features[word] = True
return features
featuresets = [(email_features(n), g) for (n,g) in mixedemails]
Normalize words
If the word is not a stop-word then lets
consider it a “feature”
Let’s run each email through the feature extractor and collect it in a
“featureset” list
“Spambot.py” (continued)
ISE,DSCE-2013

While True:
featset = email_features(raw_input("Enter text to classify: "))
print classifier.classify(featset)
We can now directly input new email and have it classified as either Spam or
Ham
“Spambot.py” (continued)
ISE,DSCE-2013 23

Applications :-
•
Conversion from natural language to computer language
and vice-versa.
•
Translation from one human language to another.
•
Automatic checking for grammar and writing techniques.
•
Spam filtering
•
Sentiment Analysis
ISE,DSCE-2013 24

Conclusion:-
NLP takes a very important role in new machine human interfaces. When we look at
Some of the products based on technologies with NLP we can see that they are very
advanced but very useful.
But there are many limitations, For example language we speak is highly ambiguous.
This makes it very difficult to understand and analyze. Also with so many languages
spoken all over the world it is very difficult to design a system that is 100% accurate.
These problems get more complicated when we think of different people speaking the
same language with different styles.
Intelligent systems are being experimented right now.
We will be able to see improved applications of NLP in the near future.
ISE,DSCE-2013 25

References :-
•http://en.wikipedia.org/wiki/Natural_language_processing
•An overview of Empirical Natural Language Processing
by Eric Brill and Raymond J. Mooney
•Investigating classification for natural language processing tasks
by Ben W. Medlock, University of Cambridge
•Natural Language Processing and Machine Learning using Python
by Shankar Ambady.
•
http://www.slideshare.net
•http://www.doc.ic.ac.uk/~nd/surprise_97/journal/vol1/hks/index.html
l
http://googlesystem.blogspot.in/2012/10/google-improves-results-for-natural/
Codes from :https://github.com/shanbady/NLTK-Boston-Python-Meetup
ISE,DSCE-2013 26

Any Questions ???
ISE,DSCE-2013 27

Thank You...
ISE,DSCE-2013
Reach me @:
facebook.com/sumit12dec
[email protected]
9590 285 524

Natural language processing (Python)

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Natural language processing (Python)

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Pray For The Peace Of Jerusalem and You Will Prosper

Don_t_Waste_Your_Life_God.....powerpoint

VILLASUR_FACTORS_TO_CONSIDER_IN_PLATING_SALAD_10-13.pdf

Fertility awareness methods for women in the society

Chapter 5 Arithmetic Functions Computer Organisation and Architecture

syakira bhasa inggris (1) (1).pptx.......