Introduction to Corpus Linguistics for Beginner

KarlinaDenistia 34 views 28 slides Sep 24, 2024
Slide 1
Slide 1 of 28
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28

About This Presentation

Introduction to corpus linguistics


Slide Content

INTRODUCTION TO CORPUS LINGUISTICS [email protected] @ karlinakuning Karlina_Denistia

Corpus Linguistics – Karlina Denistia 2 https:// scholar.google.de / citations?hl = en&user =D2U9r3cAAAAJ&view_op= list_works&sortby = pubdate

3

Corpus linguistics

OUTLINE Background story What is corpus linguistics ? Sources of corpus data Which sources for which research ? 5

Language rules and systems Both of these are acceptable sentences We worked out the problem We worked the problem out 6

Language rules and systems Both of these are acceptable sentences We worked out the problem We worked the problem out Only one of these sentences may not be equally acceptable We worked out it We worked it out the first one is likely to sound strange to many native speakers of English 7

8 Language variation : Speaker Context Necessity

OUTLINE Background story What is corpus linguistics ? Sources of corpus data Which sources for which research ? 9

What is corpus linguistics ? Corpus linguistics describes language variation and use by looking at large amounts of texts that have been produced Written : news writing , text messaging or academic writing Oral: news reporting , face- to -face conversation or academic lectures A corpus is a representative collection of language that can be used to make statements about language use a fairly large number of examples can be read by local computer 10

OUTLINE Background story What is corpus linguistics ? Sources of corpus data Which sources for which research ? 11

Sources of corpus data 12 Containing real world examples Books, papers , letters , spoken language , dialogues , twitter , news , chat history , song lyrics , twitter , facebook posts , movie subtitle , etc Size: million words

Electronically available and computer- processable e.g., PDF  optical character recognition  text file e.g., audio file  speech to text by Siri  text file Built using semi- automated process (e.g., web crawlers ) Manually typewritten text or copied - pasted news from internet file ?

14

15

https://corpora.uni-leipzig.de/ en?corpusId =ind_mixed_2013&word= 16

17

What is called as „I am doing a corpus linguistics “? it is empirical , analyzing the actual patterns of use in natural language texts it utilizes a large and principled collection of natural texts , known as a “ corpus ”, as the basis for analysis it makes extensive use of computers for analysis , using both automatic and interactive techniques it depends on both quantitative and qualitative analytical techniques (Biber, Conrad, & Reppen , 1998: 4) 18

19

20

Break and think : What can we do with this corpus ? Morphology : Indonesian affix productivity Semantics : figurative language with ` head ‘ Syntax : adverb mobility in Indonesian Language use : new words in Indonesian corpora Pragmatics : formal and informal construction Any other ideas ? 21

OUTLINE Background story What is corpus linguistics ? Sources of corpus data Which sources for which research ? 22

Which corpus for which research? 23 British National Corpus 4,048 texts (variety of texts written in British English) Around 100 million words Lake district corpus 28 texts (Texts about Lake District between 1700 – 1900 British English) 273,861 words

Know the aim of your research 24 ( Gabrielatos , 2013)

Which one will you use? 25 British National Corpus 4,048 texts (variety of texts written in British English) Around 100 million words Lake district corpus 28 texts (Texts about Lake District between 1700 – 1900 British English) 273,861 words

Summary Corpus linguistics allows more possibilities to describe linguistics phenomena based on language use There are various kinds of corpora that could be used as the source of information for language research Choosing corpora depends on the research question (s) 26

Any questions ? See you next week  27 Note: you need to download AntConc for our next meeting

References Biber, D., S. Conrad & R. Reppen . 1998. Corpus Linguistics : Investigating Language, Structure and Use . Cambridge: Cambridge University Press Crawford, William J., and Eniko Csomay . 2016. Doing Corpus Linguistics . New York: Routledge. Gabrielatos , Costas. 2013. Sketching Muslims: A Corpus Driven Analysis of Representations Around the Word 'Muslim' in the British Press 1998-2009. Applied Linguistics , 34(3): 255:278. 28
Tags