Recognising and Interpreting Named Temporal Expressions

leonderczynski 1,104 views 23 slides Sep 10, 2013
Slide 1
Slide 1 of 23
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23

About This Presentation

Paper: http://derczynski.com/sheffield/papers/named_timex.pdf

This paper introduces a new class of temporal expression – named temporal expressions – and methods for recognising and interpreting its members. The commonest temporal expressions typically contain date and time words, like April or...


Slide Content

Recognising and Interpreting
Named Temporal Expressions
Matteo Brucato
Leon Derczynski
Hector Llorens
Kalina Bontcheva
Christian S. Jensen

How do we talk about times?
●Calendar
●Closed class of terms
–tomorrow | today | yesterday
–[next | last ] [ week | month | year]
–[1 - 31] [January – December]
●Really deterministic

Wow, it's super-deterministic!

Wow, it's super-deterministic!

Credit: Kevin Knight

… sometimes
●TempEval-2 timex recall: 66 – 88 %
●TempEval-2 normalisation: 55 – 85 %
●~150 rules needed to get to 81% (Angeli &
Uszkoreit '13)
●We can get the structured expressions OK
●But what about the rest?

Unstructured time mentions
–Christmas
–Michelmas
–Halloween
–Easter
●Can we learn how to recognise these?

Time expression diversity
●Current corpora too small to hold much linguistic variation
●Note characteristic knee in distribution (cf. Montemurro)

Named Temporal Expressions
●New class of timexes
–Doesn't look like a timex
–Doesn't sound like a timex
–… is, in fact, a timex
X

How can we mine and extract NTEs?
●Expensive to annotate and hope they appear
●Prefer an automated approach
–> Let's mine Wikipedia!
●432 English NTEs found

NTEs in Wikipedia
●Gives term and text description
●Problem: no good as a gazetteer, some entries
are polysemous (e.g. Carnival)
●Problem: recall limited with gazetteers
●Solution: build statistical tagger

Building statistical NTE tagger
●Use list of NTEs to annotate sentences
–CoNLL format, I/O binary labels
●Only use monosemous expressions
●Visit linked data searching for expressions
●If many entities found, expression is polysemous
–SELECT DISTINCT ?r {?r rdfs:label "carnival"@en}
–Not monosemous

Building statistical NTE tagger
●If a sentence contains a monosemous NTE,
also annotate any polysemous NTEs
●Assume that they will occur in temporal sense
While it might not have the retail significance
of Christmas, Halloween or Secretary's Day,
Groundhog Day remains perhaps the weirdest
American holiday.

NTE recognition results
●Baseline: gazetteer of timexes in existing
resources
●2:1 train:eval split, strict matching evaluation
●Also found new NTEs!
–European Cup
–Dayton Peace Agreement

How do we normalise NTEs?
●Target representation: TIMEX3
–January 2nd, 1980 → 1980-01-02
–Summer 2012 → 2012-SU
–now → PRESENT REF
●Statistical learning won't manage
●Use dedicated tool, TIMEN
–Open normalisation toolkit
–Anyone can contribute
–SotA normalisation performance
–Takes a document with entity boundaries marked

Using NTE descriptions
●We have semi-structured descriptions
–“six weeks after Easter”
–“last Friday in June”
–“end of week 17”
–“tenth day of Tishrei”
●How to convert these to rules?

NTE normalisation rule extraction
●Create simple parser to cover majority of NTEs
–“June 25th”
–“Last Sunday in March”
●Covers [x%] of NTE descriptions

Normalisation + NTEs
●Evaluation
●Two corpora:
–SotA (TempEval-3)
–Purpose built to be hard to normalise (TimenEval)
●On TempEval-3 (restricted newswire):
0.7% error reduction
●On TimenEval (varied genre):
4.3% error reduction

Outstanding issues:
Spatial variation
●Labo[u]r Day
–May 1 in much of the world
–first Monday in May in Australia's QLD and NT
●Summer
–Official vs. informal
–North vs. south

Outstanding issues:
Easter
●Commonly used as an
offset
●Non-trivial to determine
●“Computus”

Outstanding issues:
Multiple calendars
●Gregorian (Quite popular)
–Not particularly rational in the first place
●Lunar (China)
●Astrological
●Hebrew
●.. and so on

Outstanding issues:
Forms of expression
●Orthographic variation:
–Martin Luther King Day
–MLK Day
●Regional variation:
–autumn
–fall

Resources provided
●Corpus of NTEs
●Rules integrated into TIMEN in next release
–around November 2013

Thank you for your time!
Do you have any questions?