Digital Language Resources_2024.pdf detailed

stu2203598065 21 views 75 slides May 28, 2024
Slide 1
Slide 1 of 75
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75

About This Presentation

Lesson for English Methodists


Slide Content

DIGITALLANGUAGE RESOURCES
AND THEIR USE IN EDUCATION
Assist. Prof. Rositsa Dekova, PhD
The PaisiiHilendarskyUniversity of Plovdiv
Department of English Studies

DigitalLanguageResources
Digital dictionaries
Monolingual & Bilingual
Picture dictionaries
Learning games
Annotated Corpora
BulNC
BNC
COCA
Parallelcorpora
Bulgarian-English Parallel Corpus
LexicalSemantic Databases
WordNet
FrameNet
2
Educators & Computers

Digital dictionaries
Educators & Computers
3
Learner’s dictionaries and thesauri
http://www.collinsdictionary.com/
http://www.merriam-webster.com/
http://thesaurus.com/
http://www.eurodict.com/
http://www.lingvozone.com/free-online-dictionary

Picture dictionaries
Educators & Computers
4
https://kids.wordsmyth.net/we/
WILD Wordsmith
http://www.opdome.com/
http://www.anglomaniacy.pl/index.html
Vocabulary & Grammar
Songs & Printables

Learning English can be fun
Educators & Computers
5
https://learningapps.org/
http://www.abcya.com/
https://www.education.com/worksheet-generator/
http://www.teachers-direct.co.uk/resources/
http://puzzlemaker.discoveryeducation.com/
http://worksheets.theteacherscorner.net/
http://tools.atozteacherstuff.com/
https://www.puzzle-maker.com/
http://www.armoredpenguin.com/

Word search example
Educators & Computers
6
Word search generated by
http://puzzlemaker.discoveryeducation.com
African Animals

Word search example
Educators & Computers
7

Word search example
Educators & Computers
8
Word search from the database of
http://www.teachers-direct.co.uk/
January

Monday

Word search example
Educators & Computers
9

Word search example
Educators & Computers
10
C
black
blue
brown
green
grey
orange
purple
red
white
yellow
C

Interactive games
Educators & Computers
11
http://gamestolearnenglish.com/
http://www.eslgamesplus.com/
http://www.learninggamesforkids.com/
https://learningapps.org/
http://www.abcya.com/
...

ANNOTATEDCORPORA
Corpus–a large body of machine-readable
naturally occurring linguistic evidence.
Annotated Corpus –enhanced with various types
of linguistic information
Morphological
POS tagging
Semantic
tagging with words senses
Syntactic
tagging for syntactic information
…
12
Educators & Computers

TEXTSEGMENTATION
13
Educators & Computers

Electronic text is just a sequence of characters.
Before any processing is done the text has to be
segmented into linguistic units, such as words,
punctuation, numbers, alphanumericals(H2O), etc.
This process is called TOKENIZATIONand the
segmented units are called TOKENS.
The process of segmenting the text into sentences
is called SENTENCESPLITTING.
TEXTSEGMENTATION
14
Educators & Computers

Intra-sententialsegmentation:
Named Entities
Syntactic chunking (segmentation of noun
groups and verb groups)
Inter-sententialsegmentation:
Grouping of sentences and paragraphs into
discourse topics called TEXTTILES.
‘HIGH-LEVEL’ TEXTSEGMENTATION
15
Educators & Computers

Reduces inflectional forms and sometimes
derivationally related forms of a word to return
the base or dictionary form of a word, which is
known as the lemma.
For instance:
am, are, is be
car, cars, car's, cars' car
LEMMATIZATION
16
Educators & Computers

Parts of speech
The morphological and syntactic classes that the
different parts of speech can be assigned to.
POS tagging
Automatic assignment of descriptors called tags
to input tokens.
PART-OF-SPEECHTAGGING
17
Educators & Computers

THETAGSET
The tagsetincludes all the tags that will be
used in the POS tagging.
We could use a very coarsetagset:
N, V, Adj, Adv, Prep...
More commonly used set is finer-grained:
NN, NNS, NNP, NNPS, VB, VBG, VBN, VBP, VBZ…
The level of granularity used in the tagset
directly affects the search possibilities.
18
Educators & Computers

EXAMPLE
The_ DT
little_ JJ
boy_NN1
quickly_RB
ate_ VVD
the_DT
green_JJ
apple_NN1
./.
19
Educators & Computers

CASESOFAMBIGUITY
They love_
Vsummer_
Adjvacations.
Their love_
Nstarted in the summer_
N.
Every plant_
Nneeds water and light.
We should all plant_
Vat least one tree
in our life.
20
Educators & Computers

CASESOFAMBIGUITY
They_PNP
are_VBB
flying_NN1_VBG
planes_NN2
./.
CoreferenceResolution:
Who/what are ‘they’?
Time_NN1_VVB
flies_VBZ_NNS
like_PRP_VVB
an_AT0
arrow_NN1
./.
21
Educators & Computers

THEYAREFLYINGPLANES
22
Educators & Computers

Examples of Taggers and Parsers
Educators & Computers
23
CLAWS WWW tagger (Free web tagging service for English)
http://ucrel.lancs.ac.uk/claws/trial.html
The Stanford Parser online
http://nlp.stanford.edu:8080/parser/
Shallow Parsing Demo
Syntactic Tree Generator URL
An app that builds syntactic trees from labelled
bracket notations.

https://demo.allennlp.org/
Educators & Computers
24
Constituency Parsing:
Breaks a text into sub-phrases
(constituents).

https://demo.allennlp.org/
Educators & Computers
25
Reading comprehension–answersquestions
about a passage of text.
Sentiment Analysis–predicts whether an input is
positive or negative.
Coreference resolution –finds all expressions that
refer to the same entity in a text.
Language Modeling–generates the most likely
next words.
…

Sentiment Analysis Examples
Educators & Computers
26
She's certainly creating a stirwith her ground-
breakingmix of rap and folk.
RoBERTaLarge: The model isvery confident
thatthe sentence has apositivesentiment.
It was a complete flopbecause I couldn’t hear her
properly.
RoBERTaLarge: The model isvery confident
thatthe sentence has anegativesentiment.
Speak Out Upper-Intermediate, p. 120, ex. 4A

Sentiment Analysis Examples
Educators & Computers
27
I just hope she doesn’t go mainstream and boring like
all the other alternative stars.
RoBERTaLarge: The model issomewhat confident
thatthe sentence has apositivesentiment.
GLoVE-LSTM: The model isvery confident
thatthe sentence has anegativesentiment.
Speak Out Upper-Intermediate, p. 120, ex. 4A

Specific Applications
Educators & Computers
28
For the visually and reading impaired students:
https://www.naturalreaders.com/online/
Online Text-to-Speech
Free Chrome extension
Dyslexia Font
http://www.robobraille.org/robobraille-projects
Convert a file into an alternative, accessible format
Teaching Guides

BRITISHNATIONALCORPUS(BNC)
A 100 million word collection of samples of
written and spoken language from a wide
range of sources, designed to represent a
wide cross-section of current British English,
both spoken and written.
Available online at:
https://www.english-corpora.org/bnc/
29
Educators & Computers

BRITISHNATIONALCORPUS(BNC)
The written partof the BNC (90%) includes
extracts from regional and national newspapers,
specialist periodicals and journals for all ages and interests,
academic books and popular fiction, published and unpublished
letters and memoranda,
school and university essays, etc.
The spoken part(10%) consists of
orthographic transcriptions of unscripted informal conversations
(recorded by volunteers selected from different age, region and social
classes in a demographically balanced way)
spoken language collected in different contexts, ranging from
formal business or government meetings to radio shows and
phone-ins
30
Educators & Computers

THE CORPUS OF CONTEMPORARY
AMERICAN ENGLISH (COCA)
The Corpus of Contemporary American
English (COCA) is the largest freely-available
corpus of English, and the only large and
balanced corpus of American English.
The corpus was created by Mark Davies of
Brigham Young University.
Available online at:
https://www.english-corpora.org/coca/
31
Educators & Computers

COCA
Contains one billion words of text in eight
genres: spoken, fiction, popular magazines,
newspapers, and academic texts.
Updated regularly –25+ millionwords
included each year from 1990-2019.
Suitable for looking at current, ongoing
changes in the language.
32
Educators & Computers

THECOCA SEARCHENGINE
Searches for exact words or phrases, wildcards,
lemmas, part of speech, or any combinations of
these.
Searches for surrounding words (collocates)
within a ten-word window.
Limit searches by frequency and compare the
frequency of words, phrases, and grammatical
constructions:
by genre or even between sub-genres (or domains)
over time
33
Educators & Computers

Results for collocates of black
34
Educators & Computers

SEMANTICALLY-BASEDQUERIESOFTHECORPUS
Contrast and compare the collocates of two related
words (little/small, democrats/republicans,
men/women).
Determine the difference in meaning or use between
these words.
Find the frequency and distribution of synonyms for
nearly 60,000 words
Compare the frequency in different genres.
Create your own lists of semantically-related words, and
then use them directly as part of the query
35
Educators & Computers

SPECIFICUSESOFCOCA
To look at recent changes in English:
morphology(new suffixes –friendlyand –gate)
syntax(including prescriptive rules, quotativelike, so
notADJ, the getpassive, resultatives, and verb
complementation)
semantics(such as changes in meaning with web,
green, or gay)
lexis–including word and phrase frequency by year,
to produce lists of all words that have had large shifts
in frequency all words that have had large shifts in
frequency between specific historical periods.
36
Educators & Computers

PARALLELCORPORA
Alignment (of bitexts)
Differences in grammatical structure
with the sun not shining -нямаше слънце
Differences in lexical structure
the thermometer walksinch by inch up to the top of the
glass, и термометърът пълзисантиметър по
сантиметърдо върха на скалатa
No lexicalization
It wasn't the butler coming back. Не беше икономът.
It’s this way Положениетое такова
37
Educators & Computers

INTELLIGENTSEARCHESINBG-EN PARALLELCORPUS
The Bulgarian National Corpus search engine is available at:
http://search.dcl.bas.bg/
The syntax allows search by (combinations of) word forms,
grammatical tags, semantic relations.
Thanks to the alignment, the corresponding sentences in
parallel documents are also accessible.
The hits are paginated and the matches are highlighted.
The user is able to view the detailed information for a given
sentence in the hit set -the sentence metadata, its context,
and correspondence(s) in the other languages.
38
Educators & Computers

SEARCHASSISTANT
39
Educators & Computers

Lexical Semantic Networks
Electronic language resources which define notions
through their relations with other notions.
LSN are knowledge representation schemes involving nodes
and links (arcs or arrows) between nodes.
The nodes represent objects or concepts.
The links represent relationsbetween nodes.
The links are directed and labeled.
40

A classical taxonomy tree
41
adult = grown-up human
man = male adult
woman = female adult
child = young human
boy = male child
girl = female child
human
adult
[+adult]
man
[+male]
woman
[-male]
child
[-adult]
boy
[+male]
girl
[-male]

Lattice structure
42

Lattice structure with multiple classifications
43
+human +male –adult +adult –male –human
+human+human+male+human+human+adult–human–adult–human+adult–human–human
+male–adult–adult+adult–male+male+male–male–adult–male–adult–male

WORDNET-http://wordnet.princeton.edu/
A large lexical semantic database of English
Originally developed at Princeton University (Miller, 1990)
EuroWordNet-http://www.illc.uva.nl/EuroWordNet/
BalkaNet-http://www.dblab.upatras.gr/balkanet/index.htm
Each wordnetrepresents a unique language-internal
system of lexicalizations
In addition, the wordnetsare linked to an Inter-Lingual-
Index, based on the Princeton wordnet
44
Educators & Computers

WORDNETSTRUCTURE
Nouns, verbs, adjectives and adverbs are grouped
into sets of cognitive synonyms (synsets), each
expressing a distinct concept.
Each synsetis linked to other synsetsby means of a
small number of “conceptual relations.”
WordNetreally consists of four sub-nets, one each
for nouns, verbs, adjectives and adverbs, with few
cross-POS pointers.
45
Educators & Computers

WORDNETSTRUCTURE
http://wordnet.princeton.edu/man/wngloss.7WN.html
Each synonym set -SYNSET -encodes the relation of
equivalence between a number of lexical items –
LITERALS where each lexeme:
has unique meaning (specified by the value of SENSE)
pertains to one and the same part of speech
(specified as the value of POS)
represents one and the same lexical meaning
(specified as the value of DEF -definition)
46
Educators & Computers

47
Educators & Computers

An example: learn (Wordnet)
48
Educators & Computers

BulNethttp://dcl.bas.bg/bulnet/
A lexical semantic network of Bulgarian
comprises around 49,189 synonym sets
distributed into nine parts of speech
open-class words: nouns, verbs, adjectives and
adverbs
closed-class words: pronouns, prepositions,
conjunctions, particles and interjections
49
Educators & Computers

STRUCTURE
Each synsetis linked to its counterpart in PWN3.0 by
means of a unique identification number –ID.
The common synsetsin the Balkan languages are
marked as common concepts subsets –BCS.
In the monlingualdatabase a synsetshould be linked to
at least one other synsetthrough an intralingual
relation.
Non-obligatory information may also be encoded such
as examples of usage, stylistic, morphological or
syntactic properties.
50
Educators & Computers

RELATIONSINBULNET
Synonymous sets are linked through various relations:
SEMANTIC
Synonymy, antonymy, hypernymy, hyponymy, meronymy,
holonymy, entailment, inclusion, causation, etc.
MORPHOSEMANTIC
BEINSTATE
MORPHOLOGICAL
DERIVED
PARTICLE
EXTRALINGUISTIC
51
Educators & Computers

SEMANTICRELATIONS
SYNONYMY–a semantic relation of equivalence
between literals belonging to the same POS;
The synonyms form the synonym set also called
SYNSET.
For example:
The lexical units
{auto:1, car:2, automobile:2, machine:3, motorcar:1}
form a synsetas they refer to the same concept.
52
Educators & Computers

SEMANTICRELATIONS
ANTONYMY–a semantic relation of opposition,
established between two members belonging to
one and the same POS.
Examples:
man –woman
Hyponyms of two antonyms (nouns) should also be
antonymous pair by pair:
man –woman
actor –actress
53
Educators & Computers

SEMANTICRELATIONS
HYPERNYMYand HYPONYMY–semantic relations
between synsets, which corresponds to the notion
of class-inclusion: if W1 is a kind of W2, then W2 is
hypernym of W1 and W1 is hyponym of W2.
Example:
rose < plant < living organism
Multi-parent relations:
actress < actor
actress < female.
54
Educators & Computers

SEMANTICRELATIONS
MERONYMYand HOLONYMY
Semantic relations linking synsets
denoting wholes with those denoting
their parts:if W1 has a W2, and W2 is
part, portion, member of W1, then W1
is a meronym of W2 and W2 is a
holonymof W1.
55
Educators & Computers

TYPESOFMERONYMY
PART OF:
branch –tree
book –library
MEMBER OF:
tree–forest
player–team–league
PORTION OF:
drop–liquid
56
Educators & Computers

APPLICATIONS
options for synonym selection
queries for semantic relations of a word in the
language's lexical system
antonymy, holonymy, etc.
explanatory definition queries
translation equivalents for a lexical item
59
Educators & Computers

THERELATIONSINBULNET
The large number of relations encoded in Bulnet
effectively illustrates the semantic and
derivational richness of Bulgarian
This offers diverse opportunities for numerous
applications of the multilingual database.
60
Educators & Computers

Educators & Computers
61

Educators & Computers
62

Educators & Computers
63

FRAMENET(Fillmore and Baker 2001, 2010)
A lexical database of English that is both human-
and machine-readable.
Based on annotated examples of how words are
used in actual texts.
Tries to capture human insight into how a word
can be used and converts it into semantic
knowledge that is machine-readable.
Available online at:
http://www.icsi.berkeley.edu/~framenet
64
Educators & Computers

FRAMESEMANTICS(Fillmore, 1976, 1985)
A semantic frame is a structure used to define the
semantic meaning of a word.
Cutting
Frame elements are the separate elements which
make up a frame.
An Agentcuts an Iteminto Piecesusing an Instrument.
Lexical units are the words that evokea particular
frame.
carve.v, chop.v, cube.v, cut.v, dice.v, fillet.v, mince.v,
pare.v, slice.v
65
Educators & Computers

Educators & Computers66

Educators & Computers67

FRAMENETEXAMPLES:
68
Educators & Computers

Educators & Computers69

OTHER TOOLS
https://quizlet.com/
https://kahoot.com/
https://wordwall.net/
www.quizizz.com
www.bamboozle.com
https://slovored.com/
https://bg.padlet.com/
https://trello.com/create-
first-board
https://app.magicschool.ai/
70
Educators & Computers

Uses of Digital Language Resources
Educators & Computers
71
EDUCATION
Intelligent searches for particular language
phenomena, i.e. search by (combinations of) word
forms, grammatical tags, semantic relations;
Collocations;
Word and phrase frequencies;
Recent changes in the language;
Translation equivalents;
Semantic structure of the words and their use;
etc.

FOR YOUR ATTENTION!
THANK YOU
Educators & Computers

Let’s playnow!
73
Educators & Computers

Educators & Computers

References
Davies, Mark. 2010. The Corpus of Contemporary American
English as the first reliable monitor corpus of English Lit
Linguist Computing (2010) 25 (4): 447-464 first published
online October 27, 2010 .
The British National Corpus, version 3 (BNC XML Edition).
2007. Distributed by Oxford University Computing Services on
behalf of the BNC Consortium. URL:
http://www.natcorp.ox.ac.uk/
Reference Guide for the British National Corpus (XML Edition)
edited by Lou Burnard, February 2007. URL:
http://www.natcorp.ox.ac.uk/XMLedition/URG/
76
Educators & Computers

References
Miller, George A. (1995). WordNet: A Lexical Database for English.
Communications of the ACM Vol. 38, No. 11: 39-41.
Fellbaum, Christiane(1998, ed.) WordNet: An Electronic Lexical
Database. Cambridge, MA: MIT Press.
Koeva, S., T. Tinchevand S. Mihov. Bulgarian Wordnet-structure and
validation. In Romanian Journal of Information Science and
Technology, Vol. 7, No. 1-2, 61-78, 2004. ISSN 1453-8245 pdffile
Koeva, S. Derivational and morphosemanticrelations in Bulgarian
Wordnet. In Intelligent Information Systems, XVI, Warsaw, Academic
Publishing House, 2008, 359-389. ISBN 978-93-60434-44-4 pdffile
77
Educators & Computers

References
Ruppenhofer, J. et al. 2010. FrameNetII: Extended Theory and
Practice. https://framenet2.icsi.berkeley.edu/docs/r1.5/book.pdf
Fillmore, Charles. Introduction to FrameNet.
https://framenet.icsi.berkeley.edu/fndrupal/sites/default/file
s/FNintroCJF.ppt
Fillmore, Charles J. 1985. Frames and the Semantics of
Understanding. Quaderni di Semantica6(2): pp. 222-53.
78
Educators & Computers
Tags