Table of Contents
4. Problems Concerning the Research of Collocations
5. Advanced Methods in Corpus Linguistics
6. Example for Dictionary-Making: The Lexeme Sweet
6.1. Using Statistical Data Exemplified on BNCWeb and LDOCE
6.2. Including Statistics in Dictionary-Making
In present-day English corpora are used for dictionary-making. When looking at that, concordances and collocations play an important role here. This essay gives information about the terms collocation and concordance, provides examples and shows problems that may occur during the linguistic research process.
The study of collocation started in the 1950s. The term itself was coined by Firth. He said that “collocations of a given word are statements of the habitual or customary places of that word” i.e. the “characteristic co-occurance of patterns of words” (McEnery et al. 2006:149). So this includes statistic information too which says something about the frequency of lexemes. To make research possible, linguists need corpora with spoken texts, written texts or computer-based corpora.
Furthermore, Firth explains the term collocation in more details:
Meaning by collocation is an abstraction on the syntagmatic level and is not directly concerned with the conceptual or idea approach to the meaning of words. One of the meanings of night is its collocability with dark, and of dark, of course, collocation with night (McEnery et al. 2006:145-146).
Firth also said that collocation has to be observed in connection with specific registers, genres, authors, and texts (McEnery et al. 2006:146). He exemplified the connection with registers by providing the sentence You silly ass!. In this colloquial register other collocations co-occur with ass in formal registers, e.g. silly, dumb, young. He also found out that the author Edward Lear used in the genre of limericks the lexeme man with the collocation old but not, as it might be the case for other texts by different authors or genres, with young.
In case of semantics, statistics becomes important. In the 1960s Halliday wanted to describe the distance of collocations in one text and therefore needed data, quantitive analysis and statistical facts. He introduced the term probability in those studies and said that collocational restriction is needed to group lexical items into lexical sets (McEnery et al. 2006:146). Restrictions were also a part of John Sinclair`s research in the 1960s. He used computer-based data to examine that field and found out that the position of collocations is restricted. In the 1980s Sinclair presented a concept with fewer restrictions for collocations. He argued that words can be grouped together although they are not adjoined. Sinclair also coined the terms downward and upward collocation. Downward collocation means that the frequent node occurs with less frequent collocates and in that case semantic analysis is possible. Upward collocation means that there is a less frequent node with more frequent collocates. In the second case, the collocates are mostly grammatical lexemes or superordinates and the second case is statistically not so frequent than the first case (McEnery et al.2006:146).
Concordances can also help to identify collocations. They show how adjacent the words are to each other. With the modern computer-based corpora, such as BNCWeb, they can be structured easily. The user can see if a word stands on the right side of the target word or on the left side of it more frequently. This helps to survey the positions of lexemes in spoken or written texts and also to describe positions of word classes, e.g. if grammatical lexemes are more frequent at the beginning or at the end of sentences. It may also help to examine the context of a word.
Hoey et al. say that concordance lines in corpora help to identify what de Saussure called langue. With the help of concordances formal patterns or tendencies of a language are shown (Hoey et al. 2007:154).
One major result in concordance studies was the fact that the length of lexical units became predictable. An example for that is provided by Hoey et al. (2007:154-155). They used a corpus to examine the word endure and found texts such as
(1) that smokers will have to endure 12-hour flights by becoming
(2) remember having had to endure a certain amount of misery
(3) the animals often have to endure hours trapped in the midst
They found out that to endure is used to describe something which came by force or when people have to face something unpleasant. This becomes evident when the concordances on the left and on the right side of to endure are observed. There is a repetitive structure, on the left side is the force and usually the words be and have and on the right side is a description of the unpleasant situation. If this example is applied to other words, certain structures in a language can be observed with the help of concordances.