Text summarization

AkashKarwande 8,015 views 17 slides Dec 30, 2017

Slide 1 of 17

About This Presentation

Automatic text summarization is the process of reducing the text content and retaining the
important points of the document. Generally, there are two approaches for automatic text summarization:
Extractive and Abstractive. The process of extractive based text summarization can be divided into two
ph...

Size: 350.07 KB

Language: en

Added: Dec 30, 2017

Slides: 17 pages

Slide Content

Text Summarization Using NLP Presented by, Akash N. Karwande (2016MNS011) Guided by Prof. R.K. Chavan

Introduction The goal of summarization is to produce a shorter version of a source text by preserving the meaning and the key contents of the original document. A well written summary can signiﬁcantly reduce the amount of work needed to digest large amounts of text.

Types of Text summarization There are two types summaries Extractive summaries Abstractive summaries

Extractive summaries Extractive summaries are created by reusing portions (words, sentences, etc.) of the input text document The system extracts text from the entire collection, without modifying the text document. Most of the summarization research today is on extractive summarization.

Abstractive summaries Requires deep understanding and reasoning over the text It Provides own summary over input text without using same word or sentence in the input text Determines the actual and short meaning of each element, such as words ,sentences and paragraphs

Natural Language Toolkit leading platform for building Python programs to work with human language data NLP is a field of computer science, artificial intelligence (also called machine learning), and linguistics processing Interactions between computers and human (natural) languages It provides suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning

Continued… Following are the NLTK Libraries used in text summarization Word tokenizer Sentence tokenizer stopwords BeautifulSoup numpy library Tagging Parsing

Linguistic Preprocessing for Automatic Summarization Fig.Pipeline architecture of an information extraction process

Sentence Segmentation Converts raw text into sentences List of strings Sentence tokenizer Input Text: John owns a car. It is a Toyota. Output: Segm1: John owns a car. Segm2: It is a Toyota.

Tokenization Identifies the word tokens from given sentence Provides a list of tokens as output Word tokenizer Input: John owns a car. Output: [[John], [owns], [a], [car], [.]]

Part of speech tagging (POS Tagging) Assigns appropriate part of speech tag to each word POS is useful in extraction of nouns, adverbs, adjective, which provide some meaningful information about text Generates a list of tuples with POS annotation Input: [[John], [owns], [a], [car], [.]] Output: (NP (NNP John)) (VP (VBZ owns) (NP (DT a) (NN car))) (. .)

Entity detection Identification of predefined categories such as person, location, quantities, organizations etc NER provides the entity detection for linguistic processing NER system uses linguistic grammar-based techniques and also statistical model to identify the entity Input: (NP (NNP John)) (VP (VBZ owns) (NP (DT a) (NN car))) (. .) Output: John->Person

Relation detection Identifies the possible relation between two or more chunked sentences Co-reference chain provides a relation between two or more sentences Provides the link between pronouns and its corresponding nouns Replacement of the pronouns with proper nouns Input Text: John owns a car. It is a Toyota. (In form of parse tree) Output: "a car" -> "a Toyota"; "It" -> "a Toyota"

Conclusion Automatic Text Summarization has been shown to be useful for Natural Language Processing tasks such as Question Answering or Text Classification and other related fields of computer science such as Information Retrieval. And the access time for information searching will be improved.

Future work From our summarization result we have found that by reducing all sentences that do not contain any geographic information may lead to a loss of information, since there may exist links between that reduced sentences. Therefore, we will analyse this issue in detail, by studying graph based algorithms that capture the relationship between sentences.

References https://www.researchgate.net/publication/315667326 Extractive Based Automatic Text Summarization https://github.com/shreyans29/ The semicolon Data Analytics youtube tutorials on The Semicolon https://gist.github.com/shlomibabluki/5473521 summary_tool.py https://thetokenizer.com/2013/04/28/build-your-own-summary-tool/ https://glowingpython.blogspot.in/2014/09/text-summarization-with-nltk.html http://www.nltk.org/ NLTK 3.2.5 documentation

Text summarization

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Text summarization

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

8-top-ai-courses-for-customer-support-representatives-in-2025.pptx

7-essential-ai-courses-for-call-center-supervisors-in-2025.pptx

25-essential-ai-courses-for-user-support-specialists-in-2025.pptx

8-essential-ai-courses-for-insurance-customer-service-representatives-in-2025.pptx

Know for Certain

PPT OPD LES 3ertt4t4tqqqe23e3e3rq2qq232.pptx