AIDR Tutorial (Artificial Intelligence for Disaster Response)

mimran15 531 views 12 slides Jan 28, 2017
Slide 1
Slide 1 of 12
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12

About This Presentation

This is a short tutorial of AIDR.


Slide Content

AIDR Tutorial
Muhammad Imran
Research Scien1st
Qatar Compu1ng Research Ins1tute, HBKU
Doha, Qatar
h"p://aidr.qcri.org/

Outline
• Data collec2on in AIDR
• Data classifica2on in AIDR
• Data view/download in AIDR

Data Collec2on in AIDR
• Twi:er data collec2on strategies that AIDR supports
– By keywords
– By geographical regions
• Strict: coordinates strictly inside geo boundaries
• Approximate: tweets from a place that overlaps with the geo
boundaries.
– By following Twi:er users
– By keywords + regions
• Tweets that match any of the keywords and within the geo
boundaries.

Data Collec2on Using Keywords
• Keywords limit = 400
• One keyword could a single word like
“Suffolk” or a phrase “Suffolk accident”
• 1 keyword/phrase cannot be more than 60
bytes (1 char = 1 byte)
• Generic keywords collect irrelevant tweets
• Specific keywords most likely collect relevant
tweets

Keywords Examples

Loca2on-based Collec2on
• Bounding boxes do not act as filters for other filter
parameters. For example :
keyword=twi:er&loca2ons=-122.75,36.8,-121.75,37.8
would match any tweets containing the term Twi:er (even
non-geo tweets) OR coming from the San Francisco area.

Following Twi:er Users
For each user specified, the tool will collect:
• Tweets created by the user.
• Tweets which are retweeted by the user.
• Replies to any Tweet created by the user.
• Retweets of any Tweet created by the user.
• Manual replies, created without pressing a reply bu:on (e.g.
“@twi:erapi I agree”).
The tool will not contain:
• Tweets men2oning the user (e.g. “Hello @twi:erapi!”).
• Manual Retweets created without pressing a Retweet bu:on (e.g.
“RT @twi:erapi The API is great”).
• Tweets by protected users.
Use comma-separated list of TwiFer user id (hFp://geFwiFerid.com/)

Classifier UI

Detailed Informa2on of Classifiers

Data Classifica2on in AIDR
• Define classifiers (name, descrip2on)
– Define labels (name, descrip2on)
– Having a “miscellaneous” category will be helpful
• Wait around 15-20 minutes (for fast
collec2ons) and 30-40 minutes (for slow
collec2on)
• Start tagging

Classifier Genera2on
• Check the classifier status (UI)
– First classifier/model will be up ager 50 labeled
tweets, ideally equally distributed among labels
– If no model appears ager 50 tags, keep tagging
• Human-tagged items (the more the be:er)
• 40 more needed to re-train (next classifier target)
• Machine-tagged items (keep an eye on
misclassifica2ons)
• Quality (ideally should be 90 < AUC != 100)