distributional semantics slideshare 2.pptx

ssuser1a85ac1 9 views 43 slides Jun 01, 2024
Slide 1
Slide 1 of 43
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43

About This Presentation

Use of Distributional semantics in Natural Langauge Processing


Slide Content

.
Centre for
Data Analytics

Introduction to Distributional
Semantics

Based on the Great ESSLLI Tutorial from Evert € Lenci

André Freitas
Insight Centre for Data Analytics

Insight Workshop on Distributional Semantics
Galway, 2014

Outline

= Contemporary Semantics

" Distributional Semantics

" Compositional-Distributional Semantics
" Take-away message

Contemporary
Semantics

Shift in the Semantics Landscape

Philosophical

Scientific / Formal

a) Vz(Bz + Cz), ValAz + Br) + Va(Az + Cz)
b) 3r(Ar & Pr), Ve(Br + Pr), (Ar & Br)
o) Va(Px ++ Qz), 31-Q1 + 3=Pz

d) Ve¥y(Az & By) + (Ar & Bz)

e) Na Vr(Mz ++ Ma), Ma, -Mb+ -Na

f) (Pa V Qb), (Qb + b =c), =Pa+ Qc

8) (m= nVn = 0), Ant (Am V Ap)

h) 3cPz, 3y7Py + 213 24 y

—_ |
=| >| !

um

Corroboration

Semantics asa
complex phenomena

plex World

Semanti

* Most semantic models have dealt with particular types of
constructions, and have been carried out under very simplifying
assumptions, in true lab conditions.

« If these idealizations are removed it is not clear at all that modern
semantics can give a full account of all but the simplest
models/statements.

Real World

Baroni et al., 2012 Sahlgren, 2013

What is Distributional
Semantics?

Meaning

= Word meaning is usually represented in terms of some formal,
symbolic structure, either external or internal to the word

= External structure
- Associations between different concepts
= Internal structure
- Feature (property, attribute) lists
"= The semantic properties of a word are derived from the formal
structure of its representation
- e.g. Inference algorithm, etc.

Semantics = Meaning representation model (data) +
inference model

Formal Representation of Meaning

" Modelling fine-grained lexical inferences

John — john

chases — Ax\y.[chase(x.y)]

a— APAQ.ax[P(x) A Q(x)]

bat — Ax.[bat(x)]

John chases a bat —3x[bat(x) \ chase(john, x)]

John chases a bat = John chases an animal
kill — Axdy.[kill(x, y)] = AxAy. [CAUSE(x, BECOME(DEAD(y)))]

Formal Representation of Meaning
(Problems)

= Different meanings
- bat (animal), bat (artefact)
Word meaning acquisition
= Meaning variation in context
- clever politician, clever tycoon

= Meaning evolution

« Ambiguity, vagueness, inconsistency Lack of flexibility

Scalability

Distributional Hypothesis

“Words occurring in similar (linguistic) contexts tend
to be semantically similar”

" He filled the wampimuk with the substance, passed it
around and we all drunk some

=" We found a little, hairy wampimuk sleeping behind the
tree

Weak and Strong DH (Lenci, 2008)

= Weak DH:
- Word meaning is reflected in linguistic distributions
- By inspecting a sufficiently large number of distributional
contexts we may have a useful surrogate representation of
meaning.

= Strong DH:

- Acognitive hypothesis about the form and origin of semantic
representations

Contextual Representation

= Abstract structure that accumulates encounters with the words
in various (linguistic) contexts.

= For our purposes ...
- Context is equated with linguistic context

Distributional Semantic Models (DSMs)

“The dog barked in the park. The owner of the dog put him on the
leash since he barked.”

Distributional Semantic Models (DSMs)

“The dog barked in the park. The owner of the dog put him on the

leash since he barked.”
contexts = nouns and verbs in the same
sentence

Distributional Semantic Models (DSMs)

“The dog barked in the park. The owner of the dog put him on the

leash since he barked.”
_ contexts = nouns and verbs in the same

bark —7 sentence

dog bark : 2
park: 1
leash: 1
owner: 1

park

leash

Distributional Semantic Models (DSMs)

Vector Space Model (VSM)
distributional matrix = targets x contexts

contexts

leash | walk | run | owner | leg | bark

dog 3 5 1 5 4 2

cat 0 3 3 1 5 0

lion 0 3 2 0 1 0
targets

light 0 0 0 0 0 0

bark il 0 0 2 1 0

car 0 0 4 3 0 0

Semantic Similarity & Relatedness

bark

XiYi
Cosine: PI = a MNT

dog

"4 cat

run

Semantic Similarity & Relatedness

= Semantic similarity - two words sharing a high number of
salient
- features (attributes)
- synonymy (car/automobile)
- hyperonymy (car/vehicle)
- co-hyponymy (car/van/truck)

= Semantic relatedness (Budanitsky & Hirst 2006) - two words
semantically associated without being necessarily similar
- function (car/drive)
- meronymy (car/tyre)
- location (car/road)
- attribute (car/fast)

Distributional Semantic Models (DSMs)

= Computational models that build contextual semantic representations
from corpus data

= Semantic context is represented by a vector

* Vectors are obtained through the statistical analysis of the linguistic
contexts of a word

" Salience of contexts (cf. context weighting scheme)

= Semantic similarity/relatedness as the core operation over the model

DSMs as Commonsense Reasoning

bark

Commonsense is here

leash

DSMs as Commonsense Reasoning

Wife
Cochlear Physiology {ecit

From Y

For
Awife
spouse
ceases
legally
referre
The rig
commu
relatior

Basilar memb
reaches the t
bones. As the
maximum vibr
cells and sup}
move against
potassium (K-
decrease in tl
measurable A
converting the
provides ener
release of ne:
diffuse and cc
potential in th
neuron. The 3

Zeta function regularization [edit]
Julian Schwinger discovered a relationship(“!"at/07 needed] between zeta function regula:
asymptotic relation:
A
1(n,A) = [app +142" +3" +24 À + G(=n)

as the regulator À —} co. Based on this, he considered using the values of € ( _n)!
inconsistent results, an improved formula studied by Hartle, J. Garcia, and based on the
the zeta regularization algorithm

I(n,A) = 51(n—1,A) +6(-n) - a Te 2r +1)1

where the B's are the Bernoulli numbers and
_ Tín+1)
mr Tin — Or +2)
So every [(m, A) can be written as a linear combination of ((—1), ¢(—3),¢(—

DSMs as Commonsense Reasoning
bark

Vx Loves(x, FOPC)

vx Whale(x) => Mammal(x)

vx Grackles(x) > Black(x)

Vx (Vy Dogly) = Loves(x,y)) = (Vz Cat(z) > Hates(x,z)

3x(Cat(x) À Color(x,Black) a Owns(Mary,x))
Ax(Vy Dog(y) > Loves(x.y)) a (vz Cale) => Hates(x.z))

run

leash

Semantic best-effort

Demonstration (EasyESA)

http://treo.deri.ie/easyesa/

Applications

= Applications
- Semantic search
- Question answering
- Approximate semantic inference
- Word sense disambiguation
- Paraphrase detection
- Text entailment
- Semantic anomaly detection

Alternative Names for DSMs

Corpus-based semantics
Statistical semantics
Geometrical models of meaning
Vector semantics

Word (semantic) space models

Definition of DSMs

DSMs are tuples < T,C.R,W.M,d,S >

T target elements , i.e. the words for which the DSM provides a
contextual representation

C contexts, with which T cooccur

R relation, between T and the contexts C

W context weighting scheme

M distributional matrix, T x C

d dimensionality reduction function, d : M — M’

S distance measure, between the vectors in M’

Building a DSM

Pre-process a corpus (target, context)
Count the target-context co-occurrences
Weight the contexts (optional)

Build the distributional matrix

Reduce the matrix dimensions (optional)

= Parameters
- Corpus
- Context type
- Weighting scheme
- Similarity measure
- Number of dimensions

= A parameter configuration determines the DSM: (LSA, ESA, ...)

Parameters

= Corpus pre-processing
- Stemming/lemmatization
- POS tagging
- Syntactic Dependencies
= Context
- Document
- Paragraph
- Passage -
- Word windows 7 E
- Words Context
- Linguistic features Engineering
- Lingustic patterns

- Verbs : contexts nouns
- Verbs : contexts adverbs
- etc.

Effect of Parameters

2-word window 30-word window
e cat o kennel
e horse © puppy
@ fox e pet
@ pet e bitch
@ rabbit e terrier
e pig e rottweiler
e animal e canine
e mongrel e cat
e sheep e to bark
e pigeon e Alsatian

Context Weighting

= Smoothing frequency differences: From raw counts to log-
frequency.

= Association measures (Evert 2005): are used to give more
weight to contexts that are more significantly associated with a
target word

Definition

Context Weighting = |

Measures ED

TF-ICF
Okapi BM25

ATC

LTU

Kiela & Clark, 2014 Gref94

wy = fi

wis = log(fis) x log( À)

wy = log( fas) x log)

wy log St

Obese thy
a

(log(fis)+1.0) log( À)
tt.
0.8+0.2x fy x i

Pltastey)
wis = log OPEN
max(0, MI)
PPC)
AS
see (Curran, 2004, p. 83)

= furl
we = Th

wi =

wy = —1 x log $
tow fast
Wij = Togny ot

Similarity Measures

y Es (m4)
1
Fa de
1
Tras ee
Taller

e mente)

o

1— UT» 2 + Dell SD)
Kiela & Clark, 2014 | - Dlls

What is the best parameter configuration?

= The best parameter configuration depends on the task.

= Systematic exploration of the parameters

A Systematic Study of Semantic Vector Space Model Parameters

Douwe Kiela Stephen Clark
University of Cambridge University of Cambridge
Computer Laboratory Computer Laboratory

DSM Instances

Latent Semantic Analysis (Landauer & Dumais 1996)
Hyperspace Analogue to Language (Lund & Burgess 1996)
Infomap NLP (Widdows 2004)

Random Indexing (Karlgren & Salhgren 2001)

Dependency Vectors (Pad’o & Lapata 2007)

Explicit Semanitc Analysis (Gabrilovich & Markovitch, 2008)
Distributional Memory (Baroni & Lenci 2009)

Compositional
Semantics

Paraphrase Detection

I find it rather odd that people are already trying to tie the
Commission's hands in relation to the proposal for a
directive, while at the same calling on it to present a Green
Paper on the current situation with regard to optional and
supplementary health insurance schemes.

=?
| find it a little strange to now obliging the Commission to
a motion for a resolution and to ask him at the same time
to draw up a Green Paper on the current state of voluntary
insurance and supplementary sickness insurance.

Compositional Semantics

= Can we extend DS to account for the meaning of phrases
and sentences?

"= Compositionality: The meaning of a complex expression
is a function of the meaning of its constituent parts.

Compositional Semantics

Words that act as functions
transforming the distributional
profile of other words (e.g., verbs,
adjectives, ...).

Words in which the meaning is
directly determined by their

- distributional behaviour (e.g.,
nouns).

Compositional Semantics

Mixture Function

= old + dog

Compositional Semantics

" Take the syntactic structure to constitute the backbone
guiding the assembly of the semantic representations of
phrases.

(CHASE x cats) x dogs.

S:(X@Y)QZ
(CHASE x cats) 4 #
DP:Z S\DP:X@Y nt

|
d a

0 (S\DP)/DP:X DP:Y
vector | |

chase cats
3rd order tensor vector Baroni et al., 2012

Formal Model

= Distributional Semantics & Category Theory

no. word’) aM jéjand

Mathematical Foundations for a
Compositional Distributional Model of Meaning

Bob Coecke*, Mehrnoosh Sadrzadeh”, Stephen Clark!

Take-away message

Low acquisition effort

Simple way to build a commonsense KB

Semantic approximation as a built-in construct

Semantic best-effort

Simple to use

DSMs are evolving fast (compositional and formal grounding)

Distributional semantics brings a promising approach for
building semantic models that work in the real world

Great Introductory References

" Evert & Lenci ESSLLI Tutorial on Distributional
Semantics, 2009. (many slides were taken or adapted
from this great tutorial).

* Turney & Pantel, From Frequency to Meaning:Vector
Space Models of Semantics, 2010.

" Baroni et al., Frege in Space: A Program for
Compositional Distributional Semantics, 2012.

= Kiela & Clark: A Systematic Study of Semantic Vector
Space Model Parameters, 2014.
Tags