Using lexical chains for text summarization

4,415 views 16 slides Dec 10, 2014
Slide 1
Slide 1 of 16
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16

About This Presentation

We investigate one technique to produce a summary of
an original text without requiring its full semantic in-
terpretation, but instead relying on a model of the topic
progression in the text derived from lexical chains. We
present a new algorithm to compute lexical chains in
a text, merging several...


Slide Content

10/12/14 Anthony -Claret Onwutalobi
University of Helsinki, Finland
1
Using lexical chains for Text summarization
Define key Concepts:
Text summarization: is a text that is produced
from one or more texts, that contains a significant
portion of the information in the original text(s), in a
condensed form
•Lexical chains: is a sequence of related
words in the writing, spanning short or long distances
in entire text. E.g.: Rome → capital → city → inhabitant
•Source: http://en.wikipedia.org/wiki/Lexical_chain

Using lexical chains for Text summarization
Anthony-Claret Onwutalobi
10/12/14 Anthony Claret University of
Helsinki, Finland
2

10/12/14 Anthony Claret University of
Helsinki, Finland
3
Goal of Text summarization
•Automated summarization tools can help people
to grasp main concepts of information sources in
a short time.
•The motivation for such work is to build such tool
which is computationally efficient and creates
summaries automatically
•Reduce the size of a document while preserving
its content

10/12/14 Anthony Claret University of
Helsinki, Finland
4
Few attempts to achieve the goal
and its constraints
•Frequency based method: most frequent words
represent the most important concepts of the text. it means that
most frequent words are assumed key words and are abstracted
into a frequency table
•Constraints:
•This method Ignores the semantic content of words and their
relationship with other words or phrases
•Cue phrase method: this suggest that the first
paragraph or first sentence of each paragraph contains topic
information and some key word, like significantly, impossible, hardly
these words are assumed to be key topic

10/12/14 Anthony Claret University of
Helsinki, Finland
5
Cont,
•Constraint (cue phrase or topic based method)
•Style specific – most article have different format
and style of writing and makes it difficult to use
them
•Advantages of the two techniques
•Easy computation

10/12/14 Anthony Claret University of
Helsinki, Finland
6
To overcome the limitation of the
two method
•Lexical chains are used to determine the central theme of the text.
The chains are created using semantically related words and the
concept represented by the strongest chain is the theme of the text.
••Lexical chains are sequences of words in a text that represent the
same topic. Thus, it deals with the problem of word sense
disambiguation (WSD).
••Lexical chains can be computed in a source document by grouping
(chaining) a set of words that are semantically related (i.e. have a
sense flow).
••Lexical chains require the use of an ontology or a database which
has predefined chains of semantically similar words.
••Identities, synonyms, and hypernyms/ hyponyms are the relations
among words that might cause them to be grouped into the same
lexical chain.
••WordNet thesaurus is used for this purpose

10/12/14 Anthony Claret University of
Helsinki, Finland
7
WordNet Thesaurus
•WordNet – aggregated synonym occurrences
that appear to be related or have the same
concept
•So in using this, lexical chain is constructed by
calculating the semantic distance between
words using WordNet
•-strong lexical chains are selected and the
sentences related to those strong chains are
chosen as summary

10/12/14 Anthony Claret University of
Helsinki, Finland
8
Four steps in Text summarization
•Segmentation of the original source text
•Construction of lexical chains
•Identification of strong chains
•Extraction of significant sentences

10/12/14 Anthony Claret University of
Helsinki, Finland
9
Overall Design of the proposed System
(ATS)

10/12/14 Anthony Claret University of
Helsinki, Finland
10
Three steps for constructing lexical
chains
•Select a set of candidate words;
•For each candidate word, find an
appropriate chain relying on a relatedness
criterion among members of the chains
•If it is found, insert the word in the chain
and update it accordingly

10/12/14 Anthony Claret University of
Helsinki, Finland
11
FORMATION OF LEXICAL CHAINS (Using WordNet)
1.For each noun instance (candidate word, cw)
i. Collect Sense Numbers and SynsetOffsets of words
from WordNet.
ii. For each of these senses, find the words having
following relationships with them:
•Synonyms
•Hypernyms(upto2 levels of depth)
•Hyponyms (upto2 levels of depth)
•Put the pair (cw, weight) at the end of linked list containing all
such pairs in the hash table indexed by the sense offset
2.For each noun instance & for each its
corresponding “lexical-chain”
•Keep word instance in the “lexical chain” to which it
contributes the most
•update the score of the “lexical chain”

10/12/14 Anthony Claret University of
Helsinki, Finland
12
Example
•Assume the sentences “John has a computer. The machine is an IBM.”
and that the nouns have the following
senses/synonyms/hypernyms/hyponyms
•: John(0), computer(1, 2), machine(0, 2,3), unit (3),and these words are
put in a chain if they have identity, synonym, hypernym/hyponym relations
upto2 levels. The below table depicts the lexical chains.

10/12/14 Anthony Claret University of
Helsinki, Finland
13
Example
Sense
Index
Sense meaningElement 1 Element 2 Element 3
Chain 10 Person {John, 1} {machine, 0.5}
Chain 21 Unit {computer, 1}
Chain 32 device {computer, 1}{machine, 1}
Chain 43 organization{machine, 0.5}{unit, 0.5}
Chain 64
.
.
Chain
N N-1

10/12/14 Anthony Claret University of
Helsinki, Finland
14
Lexical Chains (cont…)
•Scoring Scheme
•IdenticalWord = 1
•Synonym = 1
•Hypernym/Hyponym = 0.5
•Data Structures:
•element = [“candidate word’, weight]
•chain = [element1, element2, …, elementN]
•lexical_chains = [hashed chain1, hashed chain2, …,
hashed chainN]

10/12/14 Anthony Claret University of
Helsinki, Finland
15
IDENTIFYING STRONG LEXICAL
CHAINS
1.Compute the aggregate score of each chain by
summing the scores of each individual element in the
chain.
2.Pick up the chains whose score is more than the mean
of the scores for every chain computed in the
document.
3.For each of the strong chains, identify representative
words, whose contribution to the chain is maximum
4.Choose the sentence that contains the first
appearance of a representative chain member in the
text.

10/12/14 Anthony Claret University of
Helsinki, Finland
16
•3.For each of the strong chains, identify
representative words, whose contribution
to the chain is maximum.
•4.Choose the sentence that contains the
first appearance of a representative chain
member in the text