Where do you want to go?
Guests and rooms
EN | Cambia lingua Wish list find properties near you
Regions Tourist sites Points of interest Deals Last Minute
B&B Day
first weekend of march
The Barter Week
the third week of November
B&B Card
get a minimum 5% discount
Specials Points of interest B&B Europe
FAQ and contacts Disclaimer, Cookie Policy, Privacy
Italiano English Français Deutsch Español

Where to stay in the UK and in Italy: a comparative study of the language of holiday accommodation advertisements

di Beatrice Stellin
Università degli Studi di Padova
Facoltà di Lettere e Filosofia
Facoltà di Scienze Politiche
Dipartimento di Lingue e Letterature anglo-germaniche e slave

Relatore: Prof. Erik Castello
Tesi di laurea di: Beatrice Stellin (Matr. N. 614513 MZL)

2 - An introduction to Corpora

A corpus can be defined as "a large collection of authentic texts that have been gathered in electronic form according to a specific set of criteria" (Bowker, Pearson, 2002: 9) and for a specific purpose, that is to say, in order to represent a given language variety, genre, discourse domain, the work of an author or even the language of a particular period of time.
A corpus is usually characterized by four main features that make it different from other types of text collections: a corpus is made up of authentic text, its format is electronic, it is large and is informed by specific criteria. Corpora can also be of different size, depending on their purpose. Accordingly, there can be small specialized corpora, or very large ones.
It could be argued that there are many different types of corpora as there can be many different types of investigations. However, as Bowker and Pearson (2002: 11-12) pointed out, some broad categories of corpora can be identified and contrasted as follow, for example:

  1. general reference corpus vs. special purpose corpus: the former can be taken as representative of a particular language, and thus be used to make general observations on it. By contrast, the latter focuses on a particular text type or variety of a given language. However, these two different types of corpora can be compared to identify the features of a specialized language that differs from general language;
  2. written vs. spoken corpus: a written corpus contains written texts, whereas a spoken corpus consists of transcriptions of spoken materials;
  3. monolingual vs. multilingual corpus: the latter contains texts in two or more languages and can be parallel or comparable - it is parallel if it contains texts in language A and their translation into language B, comparable when it does not contain translated texts;
  4. synchronic vs. diachronic corpus: the synchronic type can be seen as a snapshot of a language used during a limited period of time, while the diachronic one can be used to study how language has evolved over time;
  5. open vs. closed corpus: the former can always be expanded, whereas a closed one cannot be enlarged once it has been created;
  6. learner corpus: it contains texts written by learners of a foreign language.

In addition to this, corpora offer a number of advantages over other types of resources (e.g. dictionaries, printed texts, subject field experts, intuition, etc.) and thus can be used as helpful complements to such resources. The advantages include the fact that:

  1. they are in electronic form, which allows for corpora to be searched more easily, to be larger and constantly updated;
  2. they are made of authentic uses of language, which allows one to find what people say or do not say and how often they do so;
  3. they can be used to conduct new investigations or to verify or reject hypotheses.

According to Bowker and Pearson (2002: 45), there are some basic guidelines to follow when designing a special purpose corpus. Particular attention has to be paid to issues such as: size, number of texts, medium, subject, text type, authorship, language and publication date. These criteria are obviously determined by the researcher's needs and by his/her project goals.