Synonymy and Register. A Corpus-based Study in American English

by Claus Arnold (Author)

Bachelor Thesis 2012 48 Pages

American Studies - Linguistics


Table of Contents

1 Introduction

2 Background
2.1 Synonymy
2.1.1 Degrees of Synonymy
2.1.2 Foreign Loans as a Source of Synonymy in English
2.2 Register
2.2.1 Components of a Register
2.2.2 Register and Formality
2.2.3 The Registers in COCA

3 Study
3.1 Finding Sample Synonyms
3.2 Method

4 Analysis
4.1 Couch vs . Sofa
4.2 Enemy vs. Foe
4.3 Noon vs. Midday
4.4 Start [V], Begin, and Commence
4.5 Stroll, Amble, and Saunter as Verbs
4.6 Annual vs. Yearly as Adjectives
4.7 Adept vs. Skillful / Skilful
4.8 Maybe vs. Perhaps as Adverbs
4.9 Synonymous Adverbs Referring to Frequency and Infrequency
4.10 Probably vs. likely [ADV]

5 Discussion
5.1 Preference for One Synonym over Another
5.2 Patterns of Register-based Distribution
5.3 Origin as a Constraint

6 Conclusion



1 Introduction

One aspect under which the vocabulary of a language can be structured is that of sense relationships. Among these, synonymy is a major type. It is the notion that certain distinct lexemes differ hardly or not at all in meaning (cf. Lyons 1995: 60). In English, for instance, maybe and perhaps, or enemy and foe are considered synonyms by dictionaries and thesauruses (cf. Webster's Third New International Dictionary, Unabridged; henceforth MW), and native speakers as well.[1] Indeed, the metalinguistic awareness of speakers, as presented in interviews or in internet forums, may be useful in order to examine (not only) this linguistic phenomenon. From the semantic perspective, synonymy can be investigated by analyzing if two (or more) supposed synonyms agree with each other in all, or nearly all, of their meaning components. However, the concept of synonymy is insofar problematic as it contradicts the assumption that a linguistic system is developed to be as economical as possible (cf. Kortmann 2005: 201). This apparent contradiction could be resolved by investigating the quantitative and qualitative differences between alleged synonyms when they are used.

In order to do so, I will adopt a pragmatic perspective and carry out a corpus-based study with a random selection of eleven synonym groups from English, each of them containing two or three lexemes which are assumed to be synonyms. First of all, the study will investigate the overall frequencies of alleged synonyms in comparison, i.e. quantitative differences. As regards qualitative differences, I will then focus on the factor register, disregarding – for the sake of conciseness and precision – other factors such as regional or social varieties, collocations, or diachronic development. Thus, I will concentrate on American English and the period 2005-2009. The corpus employed is The Corpus of Contemporary American English (Davies 2008-), henceforth COCA.

Before presenting the intention of the study, it is necessary to introduce the concept of formality. Although the degree of formality is frequently used for characterizing both lexemes and registers[2], there is no clear-cut definition of formality. For the moment, it should suffice to point out that the registers used in the study are expected to be in a relative order on a supposed informality-formality scale, with spoken language being the least formal register and academic prose the most formal register (cf. 2.2.3).

As the study is twofold, there are two assumptions I will scrutinize extensively:

- If two lexemes are synonymous, they usually differ in their overall frequency. One of them will be preferred by speakers.
- The synonyms (of one group) differ in their quantitative distribution over the registers. More concretely, the usage of the more frequent synonym is expected to be higher in the rather informal registers, especially in spoken language, whereas the usage of the rarer synonym is expected to be higher in the more formal register, especially in academic prose.

Therefore, I will investigate if differences in the enemy. in the more formal registers.

First of all, I will survey the concepts of synonymy and register (chapter 2). Chapter 3 is designed to present the procedure and method of my study. In chapter 4, the core of this paper, I analyze each sample group of the results affirm my assumptions presented in the previous two paragraphs (chapter 5).

2 Background

2.1 Synonymy

Synonymy is usually defined as sameness or extensive similarity of meaning concerning two or more expressions (cf. Lyons 1995: 60, Crystal 2003: 164). Another definition is given by Cruse (2002: 486): “We shall take synonymy, then, to be a relation of similarity/identity of meaning between senses associated with two (or more) different lexical forms.” The meaning of a word consists of the descriptive meaning, also called the denotation, and the non-descriptive meaning, or connotation.[3] The descriptive meaning is the central meaning, or the summary of semantic features of an expression (cf. Gramley & Pätzold 2003: 25). By contrast, the non-descriptive meaning refers to “regional, social, stylistic or temporal aspects” (Gramley & Pätzold 2003: 25). This will be illustrated by examples when I explain descriptive synonymy. For the sake of simplicity and shortness, whenever the term meaning is applied in this paper without any attribute, it will refer to the descriptive meaning.

When synonymy is dealt with in detail the definitions presented above will not suffice, as this sense relationship is a gradable phenomenon. Therefore, 2.1.1 is designed to present the degrees of synonymy mentioned by Lyons (1995) and Cruse (2002). Subsequently, 2.1.3 covers the impact of French and Latin loanwords on the frequency of synonyms, as the etymological differences of synonyms will be regarded in the corpus-based study (cf. chapter 4) and its discussion (cf. chapter 5).

2.1.1 Degrees of Synonymy

Degrees of synonymy may be considered rather as signposts than as clear-cut categories, as basically each case of synonymy has its very own characteristics. In the literature on synonymy, one usually finds three major degrees, absolute synonymy, descriptive (or propositional) synonymy and near-synonymy (or plesionymy).[4] According to Lyons (1995: 61), two (or more) lexemes qualify as absolute synonyms if they satisfy all of these three conditions: first, all their descriptive meanings are identical; second, they are synonymous in all contexts; and third, they have identical meanings on all dimensions of meaning, descriptive and non-descriptive . The first condition might be fulfilled the easiest, as many lexemes may be, according to Lyons (1995: 61), identical concerning their meaning(s) with other lexemes. Take, for instance, forest and woods, predict and foretell, or maybe and perhaps.

It is, however, extremely difficult for two lexemes to meet all of the three conditions mentioned by Lyons (1995: 61). Hence, absolute synonyms are extremely rare, if they exist at all (cf. Crystal 2010: 109). A reason for this is mentioned by Cruse (2002: 488): “[T]here is very little semiotic motivation for [absolute synonymy] in a natural language: the only possible utility for absolute synonyms is aesthetic, to avoid repetition of forms”.

If the existence of absolute synonymy is doubted, it is reasonable to equate synonymy with descriptive synonymy, as Kortmann (2005: 201) suggests. This type of synonymy refers to the identity or extensive similarity of descriptive meanings . Cruse (2002: 489) prefers a different term for descriptive synonymy, namely propositional synonymy, for the proposition of a sentence is not changed if an expression is substituted by a synonym. Thus, Lyons (1968: 450 cited in Cruse 2002: 489) explains: “If one sentence, S1, implies another, S2, and if the converse also holds, S1 and S2 are equivalent; . . . . If now the two equivalent sentences have the same syntactic structure and differ from one another only in that where one has lexical item x, the other has y, then x and y are synonymous.” Thus, mutual implication is the key condition of descriptive (or propositional) synonymy, which Lyons (1968: 445) defines as the impossibility of explicitly asserting S1 and, at the same time, explicitly denying S2. After this fairly theoretical approach, descriptive synonymy should be illustrated by means of an example. I will follow Lyons and test descriptive synonymy with two equivalent sample sentences:

(1) Maybe he has already arrived.
(2) Perhaps he has already arrived.

Both sentences have the same syntactic structure and differ only regarding one lexeme, namely maybe / perhaps. In addition to the fact that the sentences imply each other, the differing expressions maybe and perhaps have the same descriptive meaning. Therefore, they can be considered descriptive (or propositional) synonyms.

If two lexemes qualify as descriptive synonyms, they are likely to differ in non-descriptive meanings. Kortmann (2005: 201) lists and illustrates major types of differences:

Descriptive synonyms may differ with regard to their connotations (dogmongrel. . .), with regard to stylistic level or register (begincommence. . .), with regard to regional and social variety (e.g. differences between American and British English), or with regard to their collocations (e.g. a big/large house, but Big/?Large Brother is watching you).

Connotation is what Cruse (2002: 491) and Lyons (1995: 64) call “expressive meaning”. The descriptive synonyms dog and mongrel differ insofar in expressive meaning as the first lexeme is the expressively neutral and the second one is expressively negative, or has a pejorative connotation. In the synonym group die / pass away / kick the bucket, the first is neutral, the second has an additional component of respect, and the third an element of disrespect (cf. Cruse 2002: 491).

Register is another factor by which synonyms can be distinguished. Generally speaking, it refers to a variety of language used in a specific situation. For instance, a technical term as patella is rather used in a professional context, while the everyday term as knee-cap is likely to be employed in most other contexts. Since this paper focuses on register differences with regard to synonyms, more details about register follow in 2.2.

Last, a third type of synonymy should be mentioned: if lexemes are more or less similar but not identical in meaning, their relationship is defined as near-synonymy, for which Cruse (2002: 491) prefers the term plesionymy. As conditions for near-synonyms are much less strict than those for absolute or propositional synonyms, they are much more frequent than “real” synonyms (cf. Cruse 2002: 490f.). Near-synonymy can be investigated by the or rather test, as in the sentence He was murdered, or rather executed. Or rather signals a difference that is, however, relatively minor; thus, murder and execute qualify as near-synonyms.

2.1.2 Foreign Loans as a Source of Synonymy in English

A large number of synonyms can be traced back to the numerous French and Latin loans in the late Middle Age and the beginning of the Early Modern Age. Influx of French words started with the Norman Conquest of England in 1066 and continuously increased until the 14th century. Latin words were borrowed in large number between the 14th and the 16th century (cf. Crystal 2003: 46ff., Gramley & Pätzold 2003: 28ff.). Borrowings brought about an enormous number of word doublets, i.e. of new (near-)synonyms.

Usually the etymological origin accounts for the different domains of reference in which the related lexemes are employed. For instance, many animal terms, such as pig/sow, cow, or calf, are of Germanic descent; opposed to this their meat has denominations of French origin, as pork, beef and veal. Synonymy between a Latin loanword and a native English lexeme is found, e.g. with adjectives referring to kinship, such as maternal/motherly, paternal/fatherly, soror(i)al/sisterly, and fraternal/brotherly. As Gramley and Pätzold (2003: 30) hint, the Latin loans are often used in scholarly and formal contexts or in certain collocations as in fraternal twin s.

Generally, Anglo-Saxon terms form the basis of the English vocabulary, whereas French and Latin loans are peripheral and abstract terms. Gramley and Pätzold (2003: 30) conclude from this a correlation between the origin of a word and its preferred register:

[T]he more formal the style and the more specialized . . . the subject matter, the higher the number of loans will usually be. In everyday language, the English word will often be preferred because it is vague and covers many shades of meaning, while loan words tend to be more precise and restricted and so more difficult to handle.

Crystal (2003: 48) confirms these observations: “The Old English word is usually the more popular one, with the French word more literary, and the Latin word more learned.” We will see if the results of the register-based analysis of synonyms (cf. chapter 4) support the presumed correlation between the etymological origin on the one hand, register and frequency on the other hand.

2.2 Register

After register has been touched on in the preceding chapters, I will now deal with this concept in more detail. Register is any kind of variety that refers to a particular situation (cf. Biber & Conrad 2009: 6). That is, a single speaker uses different registers according to the different situations he is in; hence register is context-dependent. In contrast, dialects do not depend on situations, but are varieties that are associated with a specific group of speakers, especially those living in the same region or country (geographic dialects), and those belonging to the same gender, ethnicity, or social class (social dialects).

Registers can be studied on many different levels of specificity (cf. Biber & Conrad 2009: 10). Hence, spoken vs. written language would be an extremely rough division into registers. The registers in COCA (cf. 2.2.3) are still fairly general. In contrast, linguistic articles would form a much more specific register. Biber and Conrad (2009: 15ff.) emphasize that register is to be distinguished from genre and style as these are different approaches for analyzing text varieties. Register analysis may be applied to any lexico-grammatical features, whereas genre analysis focuses on specialized expressions, rhetorical organization and formatting, which are interpreted as conventional. From the style perspective one might examine the same linguistic features as from the register perspective, but interpret them in terms of their aesthetical value instead of their function. In the following, I will deal with the components which characterize a register, as presented by Cruse (2002) and, in more detail, by Biber and Conrad (2009).

2.2.1 Components of a Register

Cruse (2002: 492) presents a “traditional sub-division of register” into field (area of discourse), mode (written or spoken) and style/tenor (degree of formality). Even though fairly simplified, this sub-division may give a first notion of aspects by which situations can be analyzed in terms of linguistic variation. A detailed introduction to register, genre, and style is provided by Biber and Conrad (2009). According to these linguists, a description of a register has three major components: the situational context, the linguistic features, and the functional relationship (cf. Biber & Conrad 2009: 6f.).

First, the situational context is composed by many different characteristics. Biber and Conrad (2009: 40ff.) subsume them under the following categories:

- participants, i.e. addressors and addressees, of a text[5], and their relations among each other;

- channel, i.e. mode (speaking vs. writing) or medium;
- production circumstances, e.g. real time or planned;
- setting, i.e. time and place;
- communicative purposes;
- topic.

Second, the linguistic features correspond to words and structures which are frequent in the text investigated. Third, investigating the functional relationship between the situational context and the linguistic features presupposes that the lexical and grammatical characteristics of a text are used because they are particularly well suited to the purposes and situational context of the register.

Finally, it should be mentioned that this paper follows another approach from that of Biber and Conrad (2009). While these linguists focus on analyzing registers, my study investigates a linguistic feature (synonymy) and register is “only” the primary parameter. Hence, it can be called a register-based study, but it definitely differs from a register study.

2.2.2 Register and Formality

A characteristic not mentioned by Biber and Conrad (2009) is formality. Indeed, there seems to be no uniform definition of this concept. The reason for this is that the degree of formality probably results from interplay of the various situational characteristics listed by Biber and Conrad (2009: 40ff.). Each of the characteristics may have an at least minor impact on the degree of formality.

Gramley and Pätzold (2003: 15) relate formality to tenor, or style[6], one of their four components of register. They suggest that the degree of formality is ultimately determined by the relationship between the people communicating with language:

The closer the sender (speaker or writer) feels to his or her addressee(s), the more informal the language which the sender can use. Conversely, the more distant the personal relationship, the more formal the personal tenor is likely to be. (Gramley & Pätzold 2003: 15)

Consequently, the factor closeness between the people concerned may influence the degree of formality more than medium differences do. For instance, the language in an e-mail to a friend is likely to be much more informal than that to a business partner. By contrast, the news on the TV may not differ as much from those on the radio in terms of formality.

Topic may be another major factor which contributes to the degree of formality, especially regarding vocabulary choice (cf. Biber & Conrad 2009: 46). This holds, for instance, for the sciences or particular industrial or business branches in which technical terms are usually preferred to everyday terms. Frequently these terms are synonyms: a physician is likely to use patella instead of the popular term knee-cap when talking to colleagues. The technical term may be exacter and more prestigious as well, as one is expected to use it in a certain field to sustain his status, or his affiliation in this field.

As regards the sample synonyms in my study, the degree of formality is estimated by dictionaries and metalinguistic awareness of speakers in forums. Without any language data, it is hard to prove this. It is much easier, though still difficult, to characterize registers. I intend to do so in the next chapter, dealing with the registers accounted for in the study.

2.2.3 The Registers in COCA

COCA is divided into five registers which appear fairly general. Therefore, it should be summarized briefly what is meant by spoken, fiction, magazine, newspaper, and academic.[7] The content of denotation spoken is fairly imprecise, as in COCA this register corresponds to conversation from “more than 150 TV and radio programs” (COCA). Fiction includes not only parts of novels, short stories, plays, and beginnings of novels, but also movie scripts. Magazine refers to nearly 100 different popular magazines from a variation of specific domains. The register titled newspaper consists of ten of this kind from across the USA, and different sections of the respective newspaper are represented. Under academic nearly 100 academic journals are subsumed which cover a wide range of sciences.

The order of the COCA registers – beginning with spoken, followed by fiction, magazine, newspaper, and finally academic – is presumably not random. The first one may be the least formal one, and the degree of formality may increase with each register. Accordingly, academic prose is the most formal register. Two important questions may be raised: How can this relation between the register and the formality be motivated? Which aspects contradict this order of the registers in line with the formality?

Most evidently, the registers can be divided in terms of mode, namely into one spoken and four written registers. This has extensive consequences on other aspects of the situational contexts (cf. Biber et al. 1999: 16): spoken language, or conversation, is directly interactive. Sharing the same physical and temporal contexts, interlocutors are much closer to each other than, for instance, a sender of a magazine article and his or her addressee(s). Thus, interlocutors also have little time to plan what they want to say. They are likely to deal with more personal topics than a writer of a newspaper article. Lack of time for deliberation may, therefore, be one of the factors that generally influence the choice between synonyms in favor of the more common and, hence, more informal word.

Fiction shares with the other three written registers the lack of direct interactiveness, immediate situation, and – at least with magazines and newspapers – that it is addressed to a wide-public audience – academic journals being addressed to a comparatively specialist audience (cf. Biber et al. 1999: 15). However, in some important aspects fiction has more in common with spoken language than with newspapers, magazines, and academic prose.

First of all, fictional texts often include direct speech – in plays this forms even the vast majority of the text –, which may be close to real spoken language, although it is only fictional. Second, text in fiction is less restricted to space and to specific external requirements. Hence, the author is freer and more creative in employing language. He or she could now write, apart from direct speech, in a less formal language. Yet, in general I expect that authors of fiction, if they do not use direct speech, have the pretension to avoid common terms as often as possible and substitute them by their more formal synonyms. In summary, fiction is a very comprehensive register and ultimately the degree of formality depends on the literary genre, the topics, and the usage of spoken language by means of direct speech.

Popular magazines and newspapers want to inform about and evaluate present events and situations (cf. Biber et al. 1999: 16). Since space is limited and potential queries of the addressee cannot be answered directly, information has to be concise and precise. Formal terms may fulfill these conditions rather than informal terms and therefore be more frequently used than in these registers than in spoken language. Reasons of tradition and prestige may also explain why in newspapers and magazines formal terms are likely to be more used than comparatively informal and common ones. However, there may be differences according to the editors of the newspaper or magazine and their communicative purposes, often depending on a political bias.

More than any of the four other registers, academic language may be marked by a high usage of technical terms, which are, as mentioned, not only more unambiguous, but also more formal than everyday terms. This also relates to the relationship between the sender and addressee which share scientific rather than personal interests. Academic texts have the same main communicative purpose as newspapers and magazines in information. But while newspapers and magazines often deal with current newsworthy events and focus on simply reporting, academic prose covers rather time-independent subjects which are not only reported but also analyzed and explained (cf. Biber & Conrad 2009: 118). For analysis, formal terms may be more effective, as their usage, for being not common, may support the argument and convince the reader of it more than a common term.

In conclusion, spoken language is likely to be the least formal of the registers in COCA; newspapers, magazines, and academic journals are arguably associated with a higher degree of formality. Fiction is a special case, as it has aspects of a rather informal status and those of a more formal one. Based on the discussion in this chapter and for sake of simplicity, I will henceforth refer to spoken and fiction as the more informal registers and magazine, newspaper, and academic as the more formal registers. However, it should be kept in mind that formality may not be the only nor the most influential aspect of a register, as regards the choice between synonyms, as, for instance, the respective topic, collocations, or the personal style of the author may also have an impact.

3 Study

3.1 Finding Sample Synonyms

In order to carry out a study with reliable results, I intended to find lexemes which are very close, if not identical in their meaning(s). Adequate synonyms were searched by means of discussing lexemes with native speakers, consulting works on synonymy – especially Lyons (1995), Cruse (2002), and Kortmann (2005) – and reading suggestions on the internet, especially in various forums.[8] In the next step I drew on Webster´s and The Free Dictionary (henceforth TFD), surveying the adequacy of numerous synonyms for the purposes of the study.

In the end, I selected eleven sample groups with two or three synonyms, respectively. It should be pointed out that the selection was random, apart from the fact that, for sake of variation, I included synonyms from each of the four major grammatical word classes: nouns, verbs, adjectives, and adverbs. Note that the synonyms chosen do not necessarily represent all synonyms, or all those of any particular category, be it etymological, semantic, morphological, phonetical, or grammatical. A minor condition for the selection was to find pairs of synonyms with one lexeme of Germanic origin and one of Romance origin, in order to find some hints at a correlation between formality and descent (cf. Gramley & Pätzold 2003: 30).

3.2 Method

For examining quantitative and qualitative differences between synonyms, linguistic data are required. I decided to do a corpus-based study, as corpora enable one to investigate many linguistic phenomena in a fairly reliable and exact manner (cf. Biber & Conrad 2009: 73f.). COCA is a suitable corpus for the purpose of my study: it is comprehensive and the different registers are quite equal in number of words[9] – two factors which allow for more reliable data. It also provides the frame for this study with its three constants: American English as the diatopic variety, the period from 2005 to 2009 as the diachronic variety – rendering the study is synchronic and contemporary –, and division into five registers (spoken, fiction, magazine, newspaper, and academic). The independent variables in the study are the synonyms investigated; the dependent variables are their overall frequency and their frequency in each particular register. Finally, note that all grammatical forms of each sample lexeme were accounted for: for example, the study of enemy included both the singular enemy and the plural enemies, and when I speak of stroll, I include stroll, strolls, strolling, and strolled.


[1] Cf. <http://forum.wordreference.com/showthread.php?t=313584> (17 May 2012).

[2] Cf. Cruse (2005: 491), Gramley and Pätzold (2004: 15).

[3] Terms vary between different linguists; compare e.g. Lyons (2005) with Gramley and Pätzold (2004).

[4] This paper will use the terms by Lyons (60ff.) and mention those of Cruse (488). Again, there may be alternative designations, such as complete, strict, or perfect synonymy instead of absolute synonymy, according to the author.

[5] This paper will use text to refer to both spoken and written entities of language, as Biber and Conrad (2009: 5) do. A conversation may be considered a text, just like a newspaper article or a poem (cf. Biber & Conrad 2009: 5).

[6] Note that Gramley and Pätzold (2004: 15) subsume style under register, while Biber and Conrad (2009: 15) clearly separate the concepts.

[7] A detailed description of the registers of COCA is found on <http://corpus.byu.edu/coca>, in the drop-down menu “Help/information/contact,” scrolling down and clicking “TEXT/TYPES.”

[8] A sample discussion on synonyms can be found on <http://boards.straightdope.com/sdmb/ showthread.php?t=604306> (17 May 2012).

[9] See Appendix, p. ii.


ISBN (eBook)
ISBN (Book)
File size
779 KB
Catalog Number
Institution / College
Johannes Gutenberg University Mainz
Synonymie Synonymy American English Semantik bedeutungsgleich Corpus COCA Register Sprachebene


  • Claus Arnold (Author)

    2 titles published



Title: Synonymy and Register. A Corpus-based Study in American English