LOB CORPORA._Important aspects a translator needs to know

MeibisN 64 views 15 slides Sep 03, 2024
Slide 1
Slide 1 of 15
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15

About This Presentation

why lob corpora is important
The Lancaster-Oslo/Bergen (LOB) Corpus is a one-million-word collection of British English texts which was compiled in the 1970s in collaboration between the University of Lancaster, the University of Oslo, and the Norwegian Computing Centre for the Humanities, Bergen, t...


Slide Content

Lancaster-Oslo-Bergen Corpus

business and legal Translation Comparable bilingual corpora : The lob corpus

WHAT IS A CORPUS? IT IS A COLLECTION OF ELECTRONICALLY STORED SEMIOTIC DATA THAT HAS BEEN DESIGNED ACCORDING TO SPECIFIC CORPUS DESIGN CRITERIA TO BE MAXIMALLY REPRESENTATIVE OF (A PARTICULAR VARIETY OF) LANGUAGE OR OTHER SEMIOTIC SYSTEMS (Butler, 2004).

From the definition… It can be processed by software (electronically stored data). Meaning making. It includes gestures as well (semiotic). The corpus is representative of a language. The researchers carefully decide what to include and exclude, and in what proportion (has been designed carefully). It represents a valid sample of a language variety or any other semiotic system (representative). Naturally occurring examples of language (spoken or written). When we find out about the corpus we can make conclusions of the language or semiotic system.

What is corpus? It is a principled and large collection (body) of authentic texts that are stored in a computer, an analyzed using software designed for corpus analysis. “Principled” data collection is not done randomly, but following a planned operation. “Authentic” means genuine communication of people (going about their normal business). (Sinclair, 1996).

Computer Readable Semiotic Data (it makes the analysis easier, faster and more accurate). Authentic Material (people have produced it in particular social occasions, or they have been considered as what has been deemed as authentic). Designed to be representative. What is a corpus?

A comparable corpus is one corpus in a set of two or more monolingual corpora, typically each in a different language, built according to the same principles. The content is therefore similar and results can be compared between the corpora even though they are not translations of each other (and therefore, there are not aligned). Comparable corpus

NORMALLY SPECIALIZED COLLECTIONS OF SIMILAR SOURCE TEXTS IN THE TWO LANGUAGES. IT CAN BE ´ MINED ´ FOR TERMINOLOGY AND OTHER EQUIVALENCES SUCH CORPORA. COMPARABLE BILINGUAL CORPUS

The LOB Corpus exists in two main versions: the original version and a POS-tagged version. In the tagged corpus each word is accompanied by a word-class tag, assigned through a combination of automatic tagging programs and manual pre- and post-editing.

Tagged versions Each word is accompanied by a word-class tag There is no syntactic bracketing. I: a horizontal format, with a running text where each word is immediately followed by its associated tag; II: a vertical format, where each word is on a separate line together with its associated tag, some 'special information' and a reference number.

Bibliography https://books.google.com.pa/books?id=AyRwW9YtuRsC&pg=PA1&lpg=PA1&dq=examples+of+horizontal+and+vertical+tag+for+lob+corpus&source=bl&ots=RdSY-3LlVh&sig=ACfU3U3X3JgRERmO6-8ZceXojEjvkkTrhw&hl=es-419&sa=X&ved=2ahUKEwjhlpP22Lr3AhXXkmoFHUcHBkgQ6AF6BAgiEAM#v=onepage&q=examples%20of%20horizontal%20and%20vertical%20tag%20for%20lob%20corpus&f=false https://wmtang.org/corpus-linguistics/a-glossary-of-corpus-types/ https://www.youtube.com/watch?v=GWVFWgRgeOA https://varieng.helsinki.fi/CoRD/corpora/LOB/bibliography.html https://search.r-project.org/CRAN/refmans/corpora/html/LOBStats.html https://www1.essex.ac.uk/linguistics/external/clmt/w3c/corpus_ling/content/history.html https://www.oxfordhandbooks.com/view/10.1093/oxfordhb/9780199922765.001.0001/oxfordhb-9780199922765-miscMatter-10