- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Introducing Corpus Linguistics
Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com
Module description
• Since the 1990s, the corpus methodology has revolutionized nearly all branches of linguistics
–
–
Teaching/learning strategies
• With a dual focus on „why‟ and „how to‟ in corpus-based language studies, this practical module will be delivered through a series of lectures and hands-on lab sessions • The module also engages students in extensive reading and interaction with corpus data outside of class
Aims of the module
• The module aims to
– provide an introduction to corpus linguistics; – familiarise students with major corpus resources and tools; – pass on essential knowledge and skills for building DIY corpora; – to keep students up to date with the latest developments in corpus research; – develop students‟ ability in corpus-based language studies.
– …but rarely a random collection of text – Corpora “are generally assembled with particular purposes in mind, and are often assembled to be (informally speaking) representative of some language or text type.” (Leech 1992)
– Corpus analysis can be illuminating in “virtually all branches of linguistics or language learning.” (Leech 1997)
• One of the strengths of corpus data lies in its empirical and attested nature
• Recommended reading
– See the module syllabus at the course website
– www.lancs.ac.uk/fass/projects/corpus/ZJU/CL_syllabus.htm (pass for unzipping ebooks: lancs)
Learning outcomes
• On successful completion of the module, students will be able to
– – understand the major theoretical frameworks in corpus linguistics and formulate research questions that are amenable to corpus research; think critically about the strengths and weaknesses of the corpus methodology and decide when and how to interface it with other methodologies; get familiar with major corpus resources and tools and to develop DIY corpora when necessary; apply the corpus-based approach in their own research.
What is not a corpus?
• A list of words is not a corpus
– Building blocks of language
• A text archive is not a corpus
– A random collection of texts
• A collection of citations is not a corpus
Reading list
• Set text
– McEnery, A., Xiao, R. and Tono, Y. (2006) CorpusBased Language Studies: An Advanced Resource Book. London & New York: Routledge. – Wynne, M. (2005) Developing Linguistic Corpora. Oxford: Oxbow Books. Available online at http://www.ahds.ac.uk/creating/guides/linguisticcorpora
Outline of this session
• Lecture: introducing key concepts and debates in corpus linguistics
– – – – – – What is and is not a corpus? Why use corpora? Corpora vs. intuitions The corpus methodology A brief history of Corpus Linguistics Nature and applications of corpus-based studies
Assessment
• Option A
– A 1,000-word essay that critically reviews a corpus exploration tool or a corpus-based study (40%) – A 2,500-word project report (60%)
• Lab: testing your intuitions + exploring online resources
What is a corpus?
• The word corpus comes from Latin (“body”) and the plural is corpora • A corpus is a body of naturally occurring language
• Option B
– One 3,500-word essay based on a research project of your own choice (100%)
• Deadline: Friday 31 May 2013 • Submission
– A Word copy as email attachment
• “A corpus is a collection of (1) machine-readable (2) authentic texts (including transcripts of spoken data) which is (3) sampled to be (4) representative of a particular language or language variety.” (MXT 2006: 5)
– A short quotation which contains a word or phrase that is the reason for its selection
• A collection of quotations is not a corpus
– A short selection from a text chosen on internal criteria by human beings
Contents
1) 2) 3) 4) 5) 6) 7) 8) 9) 10) 11) 12) 13) 14) 15) 16) Introducing corpus linguistics Corpus design and types of corpora Data capture and markup Corpus annotation Making statistic claims Corpus analysis (1): concordance and wordlist Corpus analysis (2): keyword analysis Corpora in lexicographic and lexical studies Corpora in grammatical studies Corpora in diachronic studies Corpora in language variation research Corpora in sociolinguistic studies Corpora in language education Corpora in literary and stylistic studies Corpora in critical discourse analysis Corpora in contrastive and translation studies
– … pools together the intuitions of a great number of speakers – … makes linguistic analysis more objective
• This module
– …introduces the theoretical and practical issues of using corpora in linguistic studies – …explores how the corpus-based approach and other methodologies can be combined in linguistic studies
CL timetable
Mon Tues Weds Thurs Fri
25
1 8
26
2 9
27
3 10
28
4 11
29
5 12
源自文库
7th April (Sun): Friday timetable
CL timetable
• • • • • • • • 27/03 (Wed) 28/03 (Thu) 29/03 (Fri) 03/04 (Wed) 07/04 (Fri) 10/04 (Wed) 11/04 (Thu) 12/04 (Fri) 18:30-21:30 13:15-16:40 14:05-17:30 18:30-21:30 14:05-17:30 18:30-21:30 13:15-16:40 14:05-17:30 E6-224 E6-219 E6-219 E6-224 E6-219 E6-224 E6-219 E6-219
• A text is not a corpus
– Intending to be read in different ways