OUTLINE Background story What is corpus linguistics ? Sources of corpus data Which sources for which research ? 5
Language rules and systems Both of these are acceptable sentences We worked out the problem We worked the problem out 6
Language rules and systems Both of these are acceptable sentences We worked out the problem We worked the problem out Only one of these sentences may not be equally acceptable We worked out it We worked it out the first one is likely to sound strange to many native speakers of English 7
8 Language variation : Speaker Context Necessity
OUTLINE Background story What is corpus linguistics ? Sources of corpus data Which sources for which research ? 9
What is corpus linguistics ? Corpus linguistics describes language variation and use by looking at large amounts of texts that have been produced Written : news writing , text messaging or academic writing Oral: news reporting , face- to -face conversation or academic lectures A corpus is a representative collection of language that can be used to make statements about language use a fairly large number of examples can be read by local computer 10
OUTLINE Background story What is corpus linguistics ? Sources of corpus data Which sources for which research ? 11
Sources of corpus data 12 Containing real world examples Books, papers , letters , spoken language , dialogues , twitter , news , chat history , song lyrics , twitter , facebook posts , movie subtitle , etc Size: million words
Electronically available and computer- processable e.g., PDF optical character recognition text file e.g., audio file speech to text by Siri text file Built using semi- automated process (e.g., web crawlers ) Manually typewritten text or copied - pasted news from internet file ?
What is called as „I am doing a corpus linguistics “? it is empirical , analyzing the actual patterns of use in natural language texts it utilizes a large and principled collection of natural texts , known as a “ corpus ”, as the basis for analysis it makes extensive use of computers for analysis , using both automatic and interactive techniques it depends on both quantitative and qualitative analytical techniques (Biber, Conrad, & Reppen , 1998: 4) 18
19
20
Break and think : What can we do with this corpus ? Morphology : Indonesian affix productivity Semantics : figurative language with ` head ‘ Syntax : adverb mobility in Indonesian Language use : new words in Indonesian corpora Pragmatics : formal and informal construction Any other ideas ? 21
OUTLINE Background story What is corpus linguistics ? Sources of corpus data Which sources for which research ? 22
Which corpus for which research? 23 British National Corpus 4,048 texts (variety of texts written in British English) Around 100 million words Lake district corpus 28 texts (Texts about Lake District between 1700 – 1900 British English) 273,861 words
Know the aim of your research 24 ( Gabrielatos , 2013)
Which one will you use? 25 British National Corpus 4,048 texts (variety of texts written in British English) Around 100 million words Lake district corpus 28 texts (Texts about Lake District between 1700 – 1900 British English) 273,861 words
Summary Corpus linguistics allows more possibilities to describe linguistics phenomena based on language use There are various kinds of corpora that could be used as the source of information for language research Choosing corpora depends on the research question (s) 26
Any questions ? See you next week 27 Note: you need to download AntConc for our next meeting
References Biber, D., S. Conrad & R. Reppen . 1998. Corpus Linguistics : Investigating Language, Structure and Use . Cambridge: Cambridge University Press Crawford, William J., and Eniko Csomay . 2016. Doing Corpus Linguistics . New York: Routledge. Gabrielatos , Costas. 2013. Sketching Muslims: A Corpus Driven Analysis of Representations Around the Word 'Muslim' in the British Press 1998-2009. Applied Linguistics , 34(3): 255:278. 28