LOB CORPORA._Important aspects a translator needs to know

MeibisN 64 views 15 slides Sep 03, 2024

Slide 1 of 15

About This Presentation

why lob corpora is important
The Lancaster-Oslo/Bergen (LOB) Corpus is a one-million-word collection of British English texts which was compiled in the 1970s in collaboration between the University of Lancaster, the University of Oslo, and the Norwegian Computing Centre for the Humanities, Bergen, t...

Size: 4.39 MB

Language: en

Added: Sep 03, 2024

Slides: 15 pages

Slide Content

Lancaster-Oslo-Bergen Corpus

business and legal Translation Comparable bilingual corpora : The lob corpus

WHAT IS A CORPUS? IT IS A COLLECTION OF ELECTRONICALLY STORED SEMIOTIC DATA THAT HAS BEEN DESIGNED ACCORDING TO SPECIFIC CORPUS DESIGN CRITERIA TO BE MAXIMALLY REPRESENTATIVE OF (A PARTICULAR VARIETY OF) LANGUAGE OR OTHER SEMIOTIC SYSTEMS (Butler, 2004).

From the definition… It can be processed by software (electronically stored data). Meaning making. It includes gestures as well (semiotic). The corpus is representative of a language. The researchers carefully decide what to include and exclude, and in what proportion (has been designed carefully). It represents a valid sample of a language variety or any other semiotic system (representative). Naturally occurring examples of language (spoken or written). When we find out about the corpus we can make conclusions of the language or semiotic system.

What is corpus? It is a principled and large collection (body) of authentic texts that are stored in a computer, an analyzed using software designed for corpus analysis. “Principled” data collection is not done randomly, but following a planned operation. “Authentic” means genuine communication of people (going about their normal business). (Sinclair, 1996).

Computer Readable Semiotic Data (it makes the analysis easier, faster and more accurate). Authentic Material (people have produced it in particular social occasions, or they have been considered as what has been deemed as authentic). Designed to be representative. What is a corpus?

A comparable corpus is one corpus in a set of two or more monolingual corpora, typically each in a different language, built according to the same principles. The content is therefore similar and results can be compared between the corpora even though they are not translations of each other (and therefore, there are not aligned). Comparable corpus

NORMALLY SPECIALIZED COLLECTIONS OF SIMILAR SOURCE TEXTS IN THE TWO LANGUAGES. IT CAN BE ´ MINED ´ FOR TERMINOLOGY AND OTHER EQUIVALENCES SUCH CORPORA. COMPARABLE BILINGUAL CORPUS

The LOB Corpus exists in two main versions: the original version and a POS-tagged version. In the tagged corpus each word is accompanied by a word-class tag, assigned through a combination of automatic tagging programs and manual pre- and post-editing.

Tagged versions Each word is accompanied by a word-class tag There is no syntactic bracketing. I: a horizontal format, with a running text where each word is immediately followed by its associated tag; II: a vertical format, where each word is on a separate line together with its associated tag, some 'special information' and a reference number.

Bibliography https://books.google.com.pa/books?id=AyRwW9YtuRsC&pg=PA1&lpg=PA1&dq=examples+of+horizontal+and+vertical+tag+for+lob+corpus&source=bl&ots=RdSY-3LlVh&sig=ACfU3U3X3JgRERmO6-8ZceXojEjvkkTrhw&hl=es-419&sa=X&ved=2ahUKEwjhlpP22Lr3AhXXkmoFHUcHBkgQ6AF6BAgiEAM#v=onepage&q=examples%20of%20horizontal%20and%20vertical%20tag%20for%20lob%20corpus&f=false https://wmtang.org/corpus-linguistics/a-glossary-of-corpus-types/ https://www.youtube.com/watch?v=GWVFWgRgeOA https://varieng.helsinki.fi/CoRD/corpora/LOB/bibliography.html https://search.r-project.org/CRAN/refmans/corpora/html/LOBStats.html https://www1.essex.ac.uk/linguistics/external/clmt/w3c/corpus_ling/content/history.html https://www.oxfordhandbooks.com/view/10.1093/oxfordhb/9780199922765.001.0001/oxfordhb-9780199922765-miscMatter-10

LOB CORPORA._Important aspects a translator needs to know

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

LOB CORPORA._Important aspects a translator needs to know

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

8-top-ai-courses-for-customer-support-representatives-in-2025.pptx

7-essential-ai-courses-for-call-center-supervisors-in-2025.pptx

25-essential-ai-courses-for-user-support-specialists-in-2025.pptx

8-essential-ai-courses-for-insurance-customer-service-representatives-in-2025.pptx

Know for Certain

PPT OPD LES 3ertt4t4tqqqe23e3e3rq2qq232.pptx