A BERT model for Humanitarian Document Geolocation

kkalimeri 7 views 17 slides Feb 26, 2025

Slide 1 of 17

About This Presentation

Best paper award in GoodIT 2024. this paper presents the geolocation biases and proposes a new Bert model that improves the geolocation for humanitarian documents.

Size: 5.69 MB

Language: en

Added: Feb 26, 2025

Slides: 17 pages

Slide Content

Leave no Place Behind: Improved Geolocation in Humanitarian Documents Enrico M. Belliardo , Kyriaki Kalimeri , Yelena Mejova ISI Foundation, Turin, Italy GoodIT , September 6, 2023

Zaatari refugee camp world’s largest camp for Syrian refugees in Jordan Opened in July 2012 Now permanent settlement 2

if I search Zaatari on Google maps, I find a car wash in Italy 3

Information overload in humanitarian sector DEEP – a collaborative analysis platform for effective aid responses 4

Geolocation extraction from text Geographic locations can be ambiguous and written in many ways and languages Location databases (gazetteers) are Western-biased https://unsdg.un.org/2030-agenda/universal-values/leave-no-one-behind https://www.theguardian.com/news/datablog/2015/apr/28/the-hidden-biases-of-geodata 5

Geolocation extraction from text geotagging the extraction of text fragments that may be a location (“toponyms”) geocoding the disambiguation of the toponym to a specific geographic location 6

Data Download humanitarian documents and reports listed in HumSet Convert HTML & PDF to text 15,661 documents from 45 projects 33 countries We annotate a sample for geotagging geocoding 7

Geotagging (finding toponyms in text) 469 English-language documents coded by DEEP annotators Using Label Studio app Sample stratified by country, filtered to have enough text Pre-annotated with a union of Spacy en_core_web_md roBERTa xlm - roberta -base- wikiann - ner “Literal” vs. “associative” toponyms (as defined by Gritta et al.) Literal: “latest events in central Syria ” Associative: “ Syria Red Cross aided border regions” Total of 11,025 toponyms Gritta , Milan, Mohammad Taher Pilehvar , and Nigel Collier. "A pragmatic guide to geoparsing evaluation: Toponyms, Named Entity Recognition and pragmatics." Language resources and evaluation 54 (2020): 683-712. 8

Geotagging (finding toponyms in text) 9

Geocoding (identifying geolocations/GPS) Relating toponyms to the unique GeoNames ID Custom-built tool Pre-matched using search engine built on GeoNames location names, selecting only administrative division (AD), populated place (PPL), mountain (MT), sea (SEA), lake (LK), island (ISL) and airport (AIR) 561 unique document/toponym match pairs from 39 documents, with 474 having non-empty matches, spanning 78 countries 10

Geocoding (identifying geolocations/GPS) 11

Annotations available at: https://github.com/embelliardo/HumSet_geolocation_annotations (see paper) 12

Improving geotagging Tuning Spacy and roBERTa models on new data exact matches also partial matches strict: test on unseen country 13

Improving geotagging Introducing FeatureRank Search for candidate locations (using exact match or Okapi BM25F) Compute country distribution of all guesses in the document Rank candidates by features including whether it is a capital or country, administration level, population, and document country distribution 14

Extracting locations in HumSet Annotate 6733 documents, extracting 13,967 distinct locations 15

Next steps Expand to event detection Quantitative extraction Time extraction Entity grouping into events Summarization Analysis 16

A BERT model for Humanitarian Document Geolocation

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

A BERT model for Humanitarian Document Geolocation

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Pray For The Peace Of Jerusalem and You Will Prosper

Don_t_Waste_Your_Life_God.....powerpoint

VILLASUR_FACTORS_TO_CONSIDER_IN_PLATING_SALAD_10-13.pdf

Fertility awareness methods for women in the society

Chapter 5 Arithmetic Functions Computer Organisation and Architecture

syakira bhasa inggris (1) (1).pptx.......