# Modelling extremal stock returns in a stable Paretian environment

Diploma Thesis 2003 132 Pages

## Excerpt

## Table of Contents

List of Tables

List of Figures

1. Introduction: The Empirical Distribution of Stock Returns

1.1 Stock returns, volatility, and asset allocation

1.2 Stylised facts of stock return distributions

2. Models for Stock Return Distributions

2.1 An overview of full parametric return distribution models

2.2 Basic approaches to inference about extremal returns

3. Technical Background for Tail Inference

3.1 Extreme Value Theory (EVT)

3.2 The stable Paretian model

4. Estimation of the Stable Paretian Index [illustration not visible in this excerpt]

4.1 Desirable properties of an estimator and evaluation criteria

4.2 Estimation methodologies and prerequisites

4.3 Tail estimators

4.3.1 Intuition behind tail estimation

4.3.2 Where does the tail start?

4.3.3 Hill estimator (1975)

4.3.4 Modifications of the Hill estimator

4.3.5 Generalisations of the Hill estimator

4.3.6 Pickands estimator (1975)

4.3.7 Generalisations of the Pickands estimator

4.3.8 De Haan-Resnick estimator (1980)

4.3.9 De Haan-Pereira estimator (1999)

4.4 Estimation via the Peaks over Threshold (PoT) method

4.4.1 ML techniques

4.4.2 Method of probability-weighted moments (PWM)

4.4.3 Falk-Marohn estimator (1999)

4.4.4 Choice of the threshold level

4.5 Characteristic function techniques

4.5.1 Method of Moments estimators (MME)

4.5.2 Minimum Distance estimators (MDE)

4.5.3 Regression-type estimators

4.6 Maximum Likelihood estimators (MLE)

4.6.1 Algorithms for computation of the stable Paretian PDF

4.6.2 The ML estimation procedure

4.7 Quantile-based estimators

4.7.1 Quantile estimator by Fama and Roll (1971)

4.7.2 McCulloch estimator (1986)

4.8 Other approaches

4.9 Which estimator is the best one?

5. An Empirical Comparison of Estimators

5.1 Simulation study of tail estimators

5.1.1 Performance with Pareto data and small sample properties

5.1.2 Performance with Fréchet data

5.1.3 Performance with stable data

5.2 Consideration of modified tail estimators

5.3 Application to stock return data

5.3.1 Daily stock returns

5.3.2 Weekly stock returns

6. Conclusion & Summary

Appendix to Main Text

List of References

List of Abbreviations

List of Symbols

Declaration (Versicherung)

Curriculum Vitae Hendrik Kohleick (Lebenslauf)

## List of Tables

**Table 1:** Bias and standard deviation of tail estimators for different sample fractions with Pareto ([illustration not visible in this excerpt]1.5) data ([illustration not visible in this excerpt]500 replications, [illustration not visible in this excerpt] 4,000).

**Table 2:** Bias and standard deviation of tail estimators for different sample fractions with Pareto ([illustration not visible in this excerpt]1.9) data ([illustration not visible in this excerpt]500 replications, [illustration not visible in this excerpt] 4,000).

**Table 3:** Asymptotic behaviour of estimator *bias* for different rules of selecting the sample fraction (per-centage rule ([illustration not visible in this excerpt]0.22) vs. [illustration not visible in this excerpt] rule) with Pareto (**[illustration not visible in this excerpt]** 1.7) data ([illustration not visible in this excerpt]500, [illustration not visible in this excerpt] 2,000).

**Table 4:** Asymptotic behaviour of tail estimator *standard deviation* for different rules of selecting the sample fraction with Pareto (**[illustration not visible in this excerpt]** 1.7) data ([illustration not visible in this excerpt]500, [illustration not visible in this excerpt]50, [illustration not visible in this excerpt] 2,000).

**Table 5:** Asymptotic behaviour of tail estimator *mean squared error (MSE)* for different rules of selecting the sample fraction with Pareto (**[illustration not visible in this excerpt]** 1.7) data ([illustration not visible in this excerpt]500, [illustration not visible in this excerpt]50, [illustration not visible in this excerpt] 2,000).

**Table 6:** Bias and standard deviation of tail estimators for different sample fractions with Fréchet (**[illustration not visible in this excerpt]** 1.5) data ([illustration not visible in this excerpt]500 replications, [illustration not visible in this excerpt] 4,000).

**Table 7:** Bias and standard deviation of tail estimators for different sample fractions with Fréchet (**[illustration not visible in this excerpt]** 1.9) data ([illustration not visible in this excerpt]500 replications, [illustration not visible in this excerpt] 4,000).

**Table 8:** Bias of tail estimators for different sample fractions with stable Paretian (**[illustration not visible in this excerpt]** 1.5) data ([illustration not visible in this excerpt]1 replication, [illustration not visible in this excerpt] 4,000).

**Table 9:** Bias of tail estimators for different sample fractions with stable Paretian (**[illustration not visible in this excerpt]** 1.7) data ([illustration not visible in this excerpt]1 replication, [illustration not visible in this excerpt] 4,000).

**Table 10:** Bias of tail estimators for different sample fractions with stable Paretian (**[illustration not visible in this excerpt]** 1.9) data ([illustration not visible in this excerpt]1 replication, [illustration not visible in this excerpt] 4,000).

**Table 11:** Bias and standard deviation of original tail estimators and two modifications for different sample fractions with Pareto (**[illustration not visible in this excerpt]** 1.7) data ([illustration not visible in this excerpt]500 replications, [illustration not visible in this excerpt] 4,000).

**Table 12:** Bias and standard deviation of original tail estimators and two modifications for different sample fractions with Fréchet (**[illustration not visible in this excerpt]** 1.7) data ([illustration not visible in this excerpt]500 replications, [illustration not visible in this excerpt] 4,000).

**Table 13:** Bias of original tail estimators and two modifications for different sample fractions with stable Paretian (**[illustration not visible in this excerpt]** 1.7) data ([illustration not visible in this excerpt]1 replication, [illustration not visible in this excerpt] 4,000).

**Table 14:** Overview of key figures of daily logarithmic equity and index returns (13 April 88 – 12 April 03).

**Table 15:** Overview of key figures of weekly logarithmic equity and index returns (13 April 88 – 12 April 03, Wednesday to Wednesday).

**Table 16:** Accurate sample fraction for each of the applied tail estimators.

**Table 17:** Tail index estimates for daily returns of stock price indices (13 April 88 – 12 April 03).

**Table 18:** Tail index estimates for daily returns of five blue-chip equities (13 April 88 – 12 April 03).

**Table 19:** Tail index estimates for weekly returns of stock price indices (13 April 88 – 12 April 03).

**Table 20:** Tail index estimates for weekly returns of five blue-chip equities (13 April 88 – 12 April 03).

## List of Figures

**Fig. 1: Histograms of Microsoft daily returns vs. random numbers from normal distribution with same mean and std. deviation; illustrating fat tails and peakedness. 6**

**Fig. 2: QQ plot of Dax 30 daily and weekly logarithmic returns (01 Jan 91 – 31 Dec 00).. 7**

**Fig. 3:** [illustration not visible in this excerpt] and [illustration not visible in this excerpt] with Pareto (**[illustration not visible in this excerpt]** 1.5) data and Pareto (**[illustration not visible in this excerpt]** 1.9) data; mean estimates +/- standard deviation of [illustration not visible in this excerpt]500 replications ([illustration not visible in this excerpt] 4,000).

**Fig. 4:** [illustration not visible in this excerpt] and [illustration not visible in this excerpt] with Pareto (**[illustration not visible in this excerpt]** 1.5) data and Pareto (**[illustration not visible in this excerpt]** 1.9) data; mean estimates +/- standard deviation of [illustration not visible in this excerpt]500 replications ([illustration not visible in this excerpt] 4,000).

**Fig. 5:** Asymptotic behaviour of [illustration not visible in this excerpt] and [illustration not visible in this excerpt] for [illustration not visible in this excerpt] rule vs. percentage rule where [illustration not visible in this excerpt]0.22 with Pareto (**[illustration not visible in this excerpt]** 1.7) data ([illustration not visible in this excerpt]500, [illustration not visible in this excerpt]50, [illustration not visible in this excerpt] 2,000).

**Fig. 6:** [illustration not visible in this excerpt] and [illustration not visible in this excerpt] with Fréchet (**[illustration not visible in this excerpt]** 1.5) data and Fréchet (**[illustration not visible in this excerpt]** 1.9) data; mean estimates +/- standard deviation of [illustration not visible in this excerpt]500 replications ([illustration not visible in this excerpt] 4,000).

**Fig. 7:** [illustration not visible in this excerpt] and [illustration not visible in this excerpt] with Fréchet (**[illustration not visible in this excerpt]** 1.5) data and Fréchet (**[illustration not visible in this excerpt]** 1.9) data; mean estimates +/- standard deviation of [illustration not visible in this excerpt]500 replications ([illustration not visible in this excerpt] 4,000).

**Fig. 8:** [illustration not visible in this excerpt] and [illustration not visible in this excerpt] estimates with stable Paretian (**[illustration not visible in this excerpt]** 1.5) data, stable Paretian (**[illustration not visible in this excerpt]** 1.7) data, and stable Paretian (**[illustration not visible in this excerpt]** 1.9) data; ([illustration not visible in this excerpt]1 replication, [illustration not visible in this excerpt] 4,000).

**Fig. 9:** [illustration not visible in this excerpt] and [illustration not visible in this excerpt] estimates with stable (**[illustration not visible in this excerpt]** 1.5) data, stable (**[illustration not visible in this excerpt]** 1.7) data, and stable (**[illustration not visible in this excerpt]** 1.9) data; ([illustration not visible in this excerpt]1 replication, [illustration not visible in this excerpt] 4,000).

**Fig. 10:** [illustration not visible in this excerpt]/[illustration not visible in this excerpt], [illustration not visible in this excerpt]/[illustration not visible in this excerpt], [illustration not visible in this excerpt]/[illustration not visible in this excerpt] with Pareto (**[illustration not visible in this excerpt]** 1.7) data; mean estimates +/- standard deviation of [illustration not visible in this excerpt]500 replications ([illustration not visible in this excerpt] 4,000).

**Fig. 11:** [illustration not visible in this excerpt]/[illustration not visible in this excerpt] , [illustration not visible in this excerpt]/[illustration not visible in this excerpt], [illustration not visible in this excerpt]/[illustration not visible in this excerpt] with Fréchet (**[illustration not visible in this excerpt]** 1.7) data; mean estimates +/- standard deviation of [illustration not visible in this excerpt]500 replications ([illustration not visible in this excerpt] 4,000).

**Fig. 12:** [illustration not visible in this excerpt]/[illustration not visible in this excerpt], [illustration not visible in this excerpt]/[illustration not visible in this excerpt], [illustration not visible in this excerpt]/[illustration not visible in this excerpt] estimates with stable Paretian (**[illustration not visible in this excerpt]** 1.7) data; ([illustration not visible in this excerpt]1 replication, [illustration not visible in this excerpt] 4,000).

**Fig. 13:** Histograms of daily stock index returns (Dax 30, FTSE 100, S&P 500); daily equity returns (Elf Aquitaine, Microsoft); logarithmic scale.

**Fig. 14:** Histograms of daily equity returns (Deutsche Bank, Coca Cola, Volkswagen); logarithmic scale.

**Fig. 15:** [illustration not visible in this excerpt] estimates of upper and lower tail index for daily returns of Dax 30 and FTSE 100 stock price indices (13 April 88 – 12 April 03).

**Fig. 16:** [illustration not visible in this excerpt]/[illustration not visible in this excerpt] estimates of upper and lower tail index for daily returns of Coca Cola, Deutsche Bank, Elf Aquitaine, and Volkswagen.

**Fig. 17:** [illustration not visible in this excerpt] estimates of upper and lower tail index for weekly returns of Dax 30 and S&P 500 stock price indices (13 April 88 – 12 April 03).

**Fig. 18:** [illustration not visible in this excerpt]/[illustration not visible in this excerpt] estimates of upper and lower tail index for weekly returns of Elf Aquitaine, Deutsche Bank, Coca Cola, and Volkswagen.

## 1. Introduction: The Empirical Distribution of Stock Returns

### 1.1 Stock returns, volatility, and asset allocation

For a long time, it has been observed that when making investment decisions, individuals would not only look at their expected profit or rate of return, but also include the perceived risk inherent with the asset. Financial market theory has been capturing risk aversion of investors for over 50 years, based on the seminal work of Markowitz (1952, pp. 77-79). It has become an indispensable element of financial models since then (Schmid et al. (*yns*), p. 1).

The perception of asset risk is closely entwined with the probability of extremal returns. The likelihood of extremal events is reflected in the distribution of the random variable underlying the return-generating process, and especially in the shape of the tails: Where the probability of extremal returns is high, the tails of the distribution are rather ‘fat’ or ‘heavy’, whereas one speaks of ‘light’ tails when extremal returns occur very rarely.

An important field of application for inference about the tail shape is the estimation of value at risk (VaR, for a definition, see Harris et al. 2001, p. 717), a concept for assessing the downside risk of portfolio values, which is closely related to the shape of the lower tail (Danielsson et al. 2000, p. 15). These findings are used to derive an optimal asset allocation. VaR calculation has traditionally been based on normally distributed security returns, yet it has been shown that results are dramatically different when the underlying model is non-normal (Tokat et al. 2003, pp. 937-938; Ortobelli et al. (*yns*), pp. 1-2).

Thus, given an observed risk aversion of investors, it is clear that the distribution of stock returns – and especially the shape of the tails – has far-reaching implications for risk assessment, portfolio management, and asset pricing (Mittnik et al. 1999a, p. 236).

Yet, albeit important, finance experts and statisticians still have considerable difficulties to understand extremal movements in stock prices (Longin 1996 , p. 383).

Basically, there are two approaches to shed some light on this question:

- **Tail inference based on full parametric assumptions**. A natural way of gaining insight about the likelihood of extreme price movements is to first establish a distributional model that fits well with empirical stock return distributions and then to estimate the parameter governing the tail behaviour (tail index).

- **Letting the tails speak for themselves**. In this case, tail inference is made with-out modelling the centre of the distribution. Tail index estimation is based on extremal returns only. This method is based on Extreme Value Theory (EVT).

This paper is going to discuss both approaches, the stable Paretian or [illustration not visible in this excerpt]-stable distribution serving as a conceptual framework for the analysis. Tail inference is essentially focused on estimation of the index [illustration not visible in this excerpt], which determines the shape of the tails.

Two questions shall be answered:

- Provided that stock returns actually follow a stable Paretian distribution, what is the best estimator for the index [illustration not visible in this excerpt]?

- Given that deviations from the stable Paretian model have frequently been observed, how can we make inference about the tail shape (and obtain an accurate estimate of [illustration not visible in this excerpt]) even if the stable model does not hold exactly?

It shall be found later that the EVT-based approach plays a crucial role in identifying suitable estimators of the tail index under relaxed distributional assumptions.

**Structure of remainder of text.** The remainder of this paper is structured as follows:

In section 1.2, some stylised facts of empirical stock return distributions are described. Section 2.1 analyses how these empirical characteristics have been captured in different return distribution models, following a historical timeline. The family of stable laws is introduced, along with theoretical and empirical findings on the goodness-of-fit.

Section 2.2 then shifts over to extremal returns. A short introduction to different ways of making inference about extremal returns is given, focused on parametric models and models based on EVT (which are most relevant here).

Section 3 provides the technical framework for the estimation of the tail index. Whilst section 3.1 gives necessary basics of EVT, section 3.2 introduces relevant technical background on the stable Paretian model.

Sections 4 and 5 are the central parts of this paper. Section 4 aims to give a com-prehensive overview of estimators of the tail index, based on the definition of desirable properties (4.1) and estimation methodologies and prerequisites (4.2). Common estim-ators are described and evaluated; previous theoretical and practical evidence on the performance provided (4.3-4.8). To conclude and summarise the theoretical part, the question whether there is a ‘best’ estimator of the tail index is addressed in paragraph 4.9.

Section 5 provides results of empirical studies conducted. In the simulation part (5.1), the focus is on evaluation of estimator performance when sampling from a known distribution with given tail index, whilst paragraph 5.2 contains applications of estimators to empirical data. Here, the aim will be to give an indication of how fat-tailed empirical stock return distributions actually are – broken down into upper and lower tails as well as daily and weekly returns.

Section 6 concludes by summarising the main findings and providing a future outlook.

### 1.2 Stylised facts of stock return distributions

In empirical comparisons, stock returns (especially daily stock returns) have consistently exhibited several characteristic features, which are known as ‘ *stylised facts* ’:

**Dependence and volatility clustering.** Financial time series are usually not inde-pendent, but exhibit stochastic dependence. Even though the linear dependence between returns of subsequent days is negligible, there is considerable dependence of squared returns (Schmid et al. 2002, pp. 8-9).

However, when focusing on tail events exclusively, it can be observed that dependency decreases when thresholds are established (Danielsson et al. 2000, pp. 6-7).

As for the correlation in the volatility of returns, conditional heteroskedasticity is a common feature, i.e. the volatility is not constant over time, but it varies (Krämer, pp. 14-15). Often a volatility clustering effect can be observed in all types of financial data, e.g. one day with an extremal stock return tends to be followed by another (Embrechts et al. 1999, p. 413). In this case, the volatility is conditional on itself.

These findings often lead to the conclusion that it is easier to predict volatility than to predict prices or returns (Schmid et al. 2002, p. 9).

But again, when focusing on extremal events, these appear to be almost “*randomly scattered*” (Danielsson et al. 2000, p. 6), i.e. the clustering effect is much less prominent. In order to measure the degree of clustering of extreme values in financial time series, the so-called *extremal index* [illustration not visible in this excerpt]^{1} is of central importance. The extremal index measures the degree of short-range dependence exhibited by extreme values (Ancona-Navarrete et al. 2000, p. 6). Smith et al. (1994), Weissman et al. (1998), and Embrechts et al. (1999, pp. 418-425) give a good overview of estimation techniques for the extremal index.

The apparent violation of the [illustration not visible in this excerpt] assumption for stock returns can bring up severe problems concerning the validity and accuracy of inferential procedures.

**Leptokurtosis.** When plotting stock returns into a histogram and fitting a normal distribution (see Fig. 1-2), one would typically encounter two major fitting problems that can be summarised under the term ‘ *excessive kurtosis* ’ or ‘ *leptokurtosis* ’ vis-à-vis the normal distribution (see Mills 1995, p. 323 or Frahm 1998, p. 32).

- **Peakedness.** In comparison to the Gaussian curve, the empirical distribution is usually more peaked (cf. Fig. 1, rhs), i.e. the probability density in the very centre of the distribution is higher (Krämer (*yns*), pp. 10-11; Mandelbrot 1963, p. 395).

In order to measure the degree of peakedness, the following quantile-based estimator has been proposed (Trede 1999, p. 17; Schmid et al. (*yns*), p. 4):

illustration not visible in this excerpt

where [illustration not visible in this excerpt] represents the [illustration not visible in this excerpt]-quantile of the empirical distribution function.

This estimator has been standardised, such that a positive value points towards a distribution that is more peaked than the Gaussian.

- **Fat tails.** The histogram of empirical returns would typically also deviate from the normal distribution in that it is often thicker on the left and right tail (see Fig. 1, lhs). Financial data, including stock returns, typically exhibit fat tails, also called ‘ *heavy tails* ’, ‘ *long tails* ’, or ‘ *thick tails* ’ (Krämer (*yns*), p. 10).

This fact has very important implications for the measurement and management of risk in portfolios (Ortobelli et al. (*yns*), pp. 2-4; Tokat et al. 2003, p. 938). In a risk-averse environment, the amount of risky assets held by the investor depends on the probability of large deviations in asset prices (e.g. extremal stock returns). When the heaviness of tails is underestimated, this may well lead to an excessive investment in risky assets, triggering sub-optimal asset allocation. Therefore, any model for tail inference that fails to reflect heavy tails is bound to show poor performance when applied in practice.

Moreover, fat-tailedness may have important *technical* implications: As tails grow heavier, finite second and even first moments can fail to exist (Bamberg et al. 2001, pp. 6-8). Even though in a financial context, there is little doubt that expected returns *do* exist, the question whether or not the second moment is finite has triggered controversial discussions (Jansen et al. 1991, p. 18). Whilst Mandelbrot (1963, p. 395) supports the infinite-variance hypothesis, other authors (e.g. Shiryaev 1999, p. 335) have questioned its validity.

There is no unanimous definition of when a distribution can actually be called ‘fat-tailed’. Bamberg et al. (2001, pp. 4-5) give an overview of different definitions, but at this point, it is sufficient to introduce another quantile-based estimator (Trede 1999, p. 18; Schmid et al. (*yns*), p. 4):

illustration not visible in this excerpt

Since the value 0 is taken on by the normal distribution (standardised estimator), positive values indicate heavy tails, whereas negative values point towards light tails.

A more analytical definition of heavy tails will be given in section 3.1, in the context of EVT.

illustration not visible in this excerpt

Fig. 1: Histograms of Microsoft daily returns (*blue*) vs. random numbers from normal distribution (*grey*) with same mean and std. deviation; illustrating fat tails (*lhs*) and peakedness (*rhs*).

**Skewness.** Besides excessive kurtosis, it has sometimes been suggested that empirical stock return distributions show considerable skewness, such that inflexible symmetric models (such as the normal distribution) are inappropriate (Shiryaev 1999, p. 331).

For instance, Harris et al. (2001, p. 716) and Mills (1995, p. 323) find that UK stock returns (FTSE 100) exhibit skewness, Simkowitz et al. (1980, pp. 307-312) show the same for the US.

A quantile-based measure of skewness has been suggested by Schmid et al. (*yns*, p. 4):

illustration not visible in this excerpt

For symmetric distributions, [illustration not visible in this excerpt]0.

In this paper, skewness will implicitly be accounted for when discussing inference about the tail index. The shape parameters for the upper and lower tail will be derived separately, such that potential differences would suggest asymmetry.

**Daily returns vs. lower-frequency returns.** In contrast to daily returns, two of the above stylised facts are less clear-cut when moving to weekly or monthly returns:

- **Volatility clustering**. When sampling at weekly or monthly intervals, the volatility clustering effect is very small. Thus, the deviation from an [illustration not visible in this excerpt] process is not equally significant in this case (Paolella 2001 , p. 1096).

- **Leptokurtosis**. When moving to lower-frequency data, the goodness-of-fit with the normal distribution tends to improve considerably (Schmid et al. 2002, p. 8). Peakedness and heavy tails are not as marked as in the case of daily returns.

Looking at the QQ plots in Fig. 2, it becomes clear immediately that for weekly returns, the normal distribution provides a much better fit in the tail region (rhs), whereas in case of daily returns, the excessive kurtosis is clearly visible (lhs).

illustration not visible in this excerpt

Fig. 2: QQ plot of Dax 30 daily (*lhs*) and weekly (*rhs*) logarithmic returns (01 Jan 91 – 31 Dec 00). Source: DataStream.

## 2. Models for Stock Return Distributions

### 2.1 An overview of full parametric return distribution models

The following section describes common models to characterise the distribution of logarithmic stock returns

illustration not visible in this excerpt

based on [illustration not visible in this excerpt] observations of the asset price [illustration not visible in this excerpt]. The log-returns can be sampled at varying intervals (e.g. daily, weekly, or monthly). According to the additivity property of log-returns (see Schmid et al. 2002, pp. 4-5), low frequency returns may be obtained by adding up higher frequency returns – a favourable feature:

illustration not visible in this excerpt

As this paper is concerned with the *unconditional* distribution of stock returns rather than conditional models or time series, the notation shall be simplified to [illustration not visible in this excerpt].

**The traditional normality assumption.** A random variable [illustration not visible in this excerpt] has a normal distribution if its PDF can be written as follows (Johnson et al. 1994, Volume 1, p. 80):

illustration not visible in this excerpt

Obviously, the assumption that stock returns follow this model is based on the Central Limit Theorem (CLT), stating that the sum of independent and identically distributed random variables [illustration not visible in this excerpt] approximately follows a normal distribution for large [illustration not visible in this excerpt] (Bomsdorf et al. 1999, p. 44). As for instance weekly (logarithmic) returns can be regarded as the sum of daily returns of the corresponding week, this reasoning appears to be sensible, provided that the [illustration not visible in this excerpt] assumption is fulfilled.

The hypothesis that stock returns follow a normal distribution used to be prevalent in statistics in the first half of the 20th century, based on early work by Bachelier in 1900, who claimed that movements in stock prices are independent and normal (*Brownian motion*). But even today, the Gaussian hypothesis is a central assumption underlying many financial models, such as the standard Capital Asset Pricing Model (CAPM) and the Black-Scholes option pricing model, in which equity prices are assumed to follow a geometric Brownian motion (Harris et al. 2001, p. 738). Moreover, the normal distribution is also the standard model underlying the calculation of VaR (Klüppelberg 2002, p. 1).

Whilst this assumption could be empirically sustained for weekly returns (Fama 1963, p. 420), it has since then been rejected on various occasions for daily and higher-frequency returns, e.g. via non-parametric inspection (Paolella 2001, p. 1095; Klüppelberg 2002, pp. 8-9). Harris et al. (2001, pp. 725-736) adopt a simulation approach to demonstrate that the Gaussian hypothesis is not a suitable working assumption.

The central problem is that the normal model, appealing though it may be in theory, is obviously not capable of capturing any of the stylised facts described in section 1.2 – neither dependency nor excessive kurtosis nor skewness.

As a result of the normality assumption being found insufficient for daily returns, researchers have proposed various other models to better reflect the stylised facts (Linden 2001, pp. 159-160). This has led to two conflicting schools, the one of which relying on a theoretically sound model (e.g. stable distributions, normal mixtures), the other working with distributions picked according to their empirical goodness-of-fit.

**Stable distributions.** The earliest evidence of non-normality of daily and higher-frequency returns was provided in the early 1960s by Mandelbrot (1963, p. 395) and Fama (1963, p. 421), see also Harris et al. (2001, p. 739).

Mandelbrot and Fama proposed the class of *stable distributions* for modelling stock returns, which had first been studied by Lévy in 1924, but since then found little attention. Within this class, the *stable Paretian* distribution was found capable of more accurately reflecting the stylised facts of empirical distributions. Even though there are other sub-classes of stable distributions, such as max-stable, min-stable, and multiplication-stable distributions (Mittnik et al. 1993, pp. 270-290 provide a good overview), those have been less extensively studied.

The stable Paretian model allows for modelling heavy tails as well as skewness (Ghose et al. 1995, pp. 227-228), yet nesting the normal distribution as a special case (Mittnik et al. 1999c, pp. 275-276). It is therefore more realistic and more flexible than the normal distribution (Mittnik et al. 1993, pp. 268-269).

There has been considerable empirical evidence in favour of stable laws (e.g. Dostoglou et al. (1999), pp. 57-58), and it has generally been regarded as a visible improvement vis-à-vis the traditional normality assumption, being *“more adherent to the reality of the market”* (Ortobelli et al. (*yns*), p. 33).

The stable model is not specific to equity returns, but has been successfully applied to many sorts of asset prices, e.g. bonds and options (see for instance Dostoglou et al. 1999, pp. 58-60).

Besides an improved practical performance, the class of stable distributions also exhibits attractive theoretical properties:

- **Generalised CLT**. Mandelbrot (1963, pp. 399-401) and Fama (1963, pp. 424-425) show that stable variables are the only possible limits of sums of [illustration not visible in this excerpt] random variables. Economically, this means that a return over a certain period can be interpreted as the sum of many small asymptotically Paretian price changes (Fama 1963, p. 426; also Tokat et al. 2003, p. 939). This finding relies upon a generalised version of the CLT (GCLT) by Feller (1966) which does *not* rule out the normal distribution. Therefore, the stable family is compatible with many financial models originally built on the normality assumption, e.g. the portfolio theory (Fielitz et al. 1983, p. 28).

- **Domains of attraction (DA)**. The stable model is applicable to sums of [illustration not visible in this excerpt] RVs even if the stable model does not hold exactly, which is a highly desirable property, given that realisations of stock returns cannot be assumed to follow an ideal theoretical distribution model (Mittnik et al. 1993, p. 265). Any distribution in the domain of attraction of a stable law will exhibit similar properties, allowing investors to base their decisions upon the idealised stable model. What is more, it can usually be checked whether or not the DA condition is met by just looking at the tails of a distribution (Mittnik et al. 1993, p. 265).

- **Stability under aggregation**. Moreover, stable Paretian distributions are invariant under addition, which is an important property for financial arbitrage theory (Hols et al. 1991, p. 295). More generally, one formulates that stable Paretian distributions belong to their own domain of attraction, i.e. this class is robust with respect to *n* -fold convolution and scaling (Mittnik et al. 1993, p. 269). It is stable with respect to cross-portfolio as well as temporal aggregation. A necessary consequence is that the tail index [illustration not visible in this excerpt] (also called ‘characteristic exponent’) stays the same regardless of the sampling interval (Tokat et al. 2003, p. 944; Deo 2002, p. 258).

So, all in all, stable distributions are found to be a flexible, theoretically sound model for describing stock returns.

Yet on the other hand, several inconsistencies with the stable distribution have been identified in empirical case studies – which have triggered a search for alternative models:

- **Finite variance**. Stable Paretian distributions only possess finite moments of order [illustration not visible in this excerpt] for 0[illustration not visible in this excerpt]2. This implies an infinite variance except for the case of the normal distribution ([illustration not visible in this excerpt]2). This property of stock returns has been questioned more than once. Jansen et al. (1991, p. 24) derive the conclusion that, based on empirical research, stock returns exhibit finite first and second moments, but infinite higher moments. This contradicts the stable Paretian notion.

- **Volatility clustering not reflected**. The stable Paretian model is based upon an [illustration not visible in this excerpt] process. Even though Mittnik et al. (1993, p. 266) state that, due to the stability property, [illustration not visible in this excerpt] assumptions may be relaxed, this turns out to be an issue when looking at volatility clusters: Generally, the model is theoretically not capable of displaying the volatility clustering effect described in section 1.2 (Groenendijk et al. 1995, pp. 253-254). This is even more important when considering that conditional volatility *alone* can be an explanatory factor for fat tails. Following Ghose et al. (1995, p. 225), the phenomenon of heavy tails can be ascribed to volatility clustering, which leads them to conclude that ARCH-type models (with stable Paretian innovations) are more suitable for modelling financial data (also see Paolella 2001, p. 1108-1110). This drawback can adversely affect inference, e.g. by limiting the applicability of testing procedures (Paolella 2001, p. 1098). According to Mittnik et al. (2000, p. 390), this is the most serious argument standing against the use of stable Paretian models in finance.

- **Stability property violated**. Deo (2002, p. 258) points out that the theoretically attractive property of invariance of the characteristic exponent vis-à-vis the sampling interval is inconsistent with empirical evidence. In many studies, it is found that when the sampling interval would be expanded (e.g. when shifting from daily to weekly returns), the estimate of [illustration not visible in this excerpt] would increase (Mittnik et al. 1993, p. 269) – a violation of the temporal aggregation criterion. This critique also fits with the observation that the normal distribution ([illustration not visible in this excerpt]2) tends to provide a better fit for low-frequency return data, whereas daily and higher-frequency returns show heavy tails and infinite variance (cf. Fig. 1-2). Officer (1972, p. 811) concludes that the stable Paretian distribution is probably not a suitable model for stock returns. Hsu et al. (1974, pp. 110-113) suggest an alternative model with non-stationary parameters.

Apart from inconsistencies between theoretical properties and empirical findings, there are some practical obstacles that impair the usefulness of the stable Paretian model in practice:

- **Slow rate of convergence**. In practice, the rate of convergence towards a stable law may be very slow, i.e. very high sample sizes are needed for the sum of [illustration not visible in this excerpt] RVs to converge towards a stable distribution. This reduces the attractiveness of working under DA conditions in practice (Mittnik et al. 1993, p. 270).

- **No closed-form density function**. Stable distributions are difficult to handle (Mittnik et al. 1999c, p. 276). There is no closed-form expression for the PDF except for some special cases, which makes practical application more difficult. For example, ML estimation of the tail index requires a time-consuming approximation of the PDF prior to the estimation procedure (see section 4.6.1).

- **Infinite variance property**. Besides being called into question per-se, the infinite variance property also triggers practical problems. The non-existence of a second moment leads to difficulties in inferential procedures, e.g. computing confidence intervals or testing (Jammalamadaka 2003, p. 4; Shiryaev 1999, p. 335).

It is for those reasons that the stable Paretian model is not without controversy in empirical finance (Ghose et al. 1995, p. 226).

Nevertheless, in this paper, this model shall be recurred to when estimating the tail index [illustration not visible in this excerpt], yet bearing in mind its limitations. It shall also be investigated in detail under what conditions one can estimate the heaviness of the tail without requiring that the above model must hold *exactly*.

A more extensive technical description of stable laws in general and the stable Paretian model in particular is provided in section 3.2.

**Other return distribution models.** After the stable model had been ‘en vogue’ for some time, disappointing results of empirical studies and the complexity of the family made some financial economists search for alternative distribution models (Linden 2001, p. 159). Various proposals have been made, some based on normative reasoning, others driven by empirical fit (Harris et al. 2001, pp. 715-716). A brief overview is provided.

**Mixtures of distributions.** Mixtures of distributions are a widespread approach towards capturing the stylised facts of stock return distributions. Various mixture models have been proposed, the most notable of which is the mixture of normal distributions. McLachlan et al. (2000) provide a good introduction to finite mixture models.

The economic intuition behind mixtures is that stock returns are essentially driven by various independent influences (e.g. different groups of investors or types of information), which are captured in the components of the mixture model.

- **Normal mixtures**. Given the complexity of [illustration not visible in this excerpt]-stable distributions, Trede (1999, pp. 19-20) suggests that mixtures of normal distributions are an attractive alternative, capable of displaying leptokurtosis and also skewness. Technically, a normal mixture is the weighted sum of [illustration not visible in this excerpt] [illustration not visible in this excerpt] normally distributed RVs with different variance and possibly different mean (for a formal definition, see McLachlan et al. 2000, pp. 6-7). The attractiveness of normal mixtures vis-à-vis [illustration not visible in this excerpt]-stable models is based on the fact that normal mixtures do not require the stability conjecture to hold (Fielitz et al. 1983, pp. 34-35). Moreover, they are more easily applicable in practice due to an algorithm that easily computes the ML estimates of the [illustration not visible in this excerpt] relevant parameters (EM algorithm, see Flury 1997, pp. 656-659). It remains an open question whether or not normal mixtures outperform non-normal stable laws (Schmid et al. (*yns*), p. 7; Fielitz et al. 1983, pp. 34-35), yet there is some evidence that they may constitute a pragmatic alternative.

- **Other mixtures**. Other mixtures proposed include the Laplace mixture distribution (Linden 2001, pp. 160-162) and mixtures of non-normal stable distributions (Fielitz et al. 1983, pp. 32-34).

**Other distributions.** Alternative models suggested include the following distributions:

- Student-[illustration not visible in this excerpt] distribution (e.g. Johnson et al. 1994, Vol. 2, pp. 362-374; Shiryaev 1999, p. 334). Some authors, such as Blattberg et al. (1974, pp. 263-277), have argued that the [illustration not visible in this excerpt]-distribution is more suitable for daily returns than the stable.

- Log-normal distribution (e.g. Hartung 1998).

- Tukey’s [illustration not visible in this excerpt] and [illustration not visible in this excerpt]distribution (e.g. Mills 1995, pp. 325-331).

### 2.2 Basic approaches to inference about extremal returns

Whilst section 2.1 was focused on characterising models capturing the stylised facts of the *whole* (unconditional) return distribution, this section describes different approaches towards describing the focal point of interest – the *tail* of the distribution.

Basically, there are five possibilities:

- Use past empirical extremal realisations *only* to predict the likelihood of extremal events (and thus tail thickness) in the future (non-parametric historical simulation).

- Have a normative model for the entire distribution (cf. 2.1) and derive tail thickness by fitting the whole empirical distribution (homoskedastic parametric models).

- Account for stochastic volatility, all else being equal.

- *Let the tails speak for themselves*, i.e. concentrate on deriving a normative model for the tail of the distribution, which is fitted using tail observations exclusively (models based on EVT).

- Combine the aforementioned methods.

**Non-parametric historical simulation (HS).** Historical simulation methods rely on forecasting the distribution of future returns with just the help of an empirical distribution function (McNeil et al. 2000, p. 273). They can be easily implemented. Plus, few assumptions are required.

However, this approach is not suitable as a stand-alone method to draw conclusions about extremal returns:

- Extremal observations – by their nature – are very rare, such that massive amounts of data are needed or else the estimator will suffer a high variance.

- Extrapolation beyond empirical observations is not feasible, yet this is *the* crucial point in tail estimation (Danielsson et al. 2000, pp. 11-12).

Mills (1995, pp. 324-331) delivers an example of how HS can be applied to stock return distributions. Yet in this paper, the approach is not pursued any further.

**Parametric homoskedastic models.** All models introduced in section 2.1 contain full parametric assumptions that can be used to describe extremal returns as well.

For instance, assuming that unconditional stock returns follow a stable Paretian distribution, the model parameters can be estimated from a set of empirical data, e.g. using the ML method. The estimate obtained for the parameter [illustration not visible in this excerpt] describes the heavi-ness of the tails and is therefore a measure of the risk of extremal returns.

In contrast to HS, this method allows for inference beyond the range of observations, yet there are two crucial drawbacks:

- Neither of the above-mentioned unconditional models incorporates stochastic volatility, even though this is a characteristic feature of stock returns. This means that obtained tail estimates cannot be based on the current volatility underlying short-term market fluctuations (McNeil et al. 2000, p. 274). This approach is therefore not suitable for gaining insight about the likelihood of short-term extremal losses, conditional upon current market volatility, over the next few days (p. 272).

- Under full parametric assumptions, central observations serve to draw inference about the tails. This can lead to estimation biases (Shiryaev 1999, p. 336), left alone it is complicated if one is just interested in tail behaviour.

Full parametric homoskedastic methods are applied in this paper in the course of tail index estimation. The effects triggered by volatility clustering are thus not explicitly accounted for.

**Parametric models with stochastic volatility.** Methods incorporating the current volatility background include ARCH and GARCH and related models (for an overview, see Danielsson et al. 2000, pp. 13-14). This approach is set to tackle the first weakness of the unconditional models described in the previous paragraph.

An appealing property of these models is that the implied unconditional return distributions exhibit fat tails, reflecting a crucial stylised fact (Rachev et al. 2000, p. 4). Yet they do *not* belong to the DA of stable distributions, but the heavy-tailedness is caused by stochastic volatility effects (Ghose et al. 1995, pp. 226-227).

While the basic GARCH model works with normal innovations (McNeil et al. 2000, p. 273), there are more advanced models with non-normal stable residuals. A good description of so-called ‘stable GARCH’ processes can be found in Mittnik et al. (2000, pp. 392-404) and Rachev et al. (2000, pp. 275-293).

Whilst standard GARCH models perform poorly in tail inference (Danielsson et al. 2000, p. 13), conditional volatility models with stable Paretian innovations are shown to satisfactorily reflect the stylised facts of return distributions, even when using smaller samples (Mittnik et al. 2000, p. 411).

Since this paper is focused on the stable Paretian hypothesis, the conditional volatility approach will rather remain in the background. However, even when acknowledging that stochastic volatility approaches reflect the current state of the art in stock return models, Paretian tail inference stays important when it comes to innovations: *“…if one wishes to interpret the error term as a random variable representing the sum of many external effects which cannot be realistically captured by the model, the stable Paretian is the only valid candidate”* (Paolella 2001, p. 1095).

**Models based on Extreme Value Theory.** An EVT-based approach avoids modelling the entire distribution. Instead, it strives to make accurate predictions about extremal events by modelling only the tails of the distribution, i.e. only the sub-sample of extremal observations that provides information about extremal events is included in the tail inference procedure (Klüppelberg 2002, p. 3).

This approach shows several striking advantages:

- The tail estimate cannot, in any case, be distorted by central observations (Danielsson et al. 2002, p. 3). In a seminal paper, DuMouchel (1983, pp. 1023-1028) showed that the estimate of the tail index [illustration not visible in this excerpt] of a stable law can be severely disrupted by deviations from the ideal model. He therefore suggested to *“let the tails speak for themselves”* (p. 1025). Doing so, it is possible to work under asymptotic conditions, such that the whole procedure is more flexible and robust. It is not necessary to be strictly committed to a specific model as presented in section 2.1 (Hols et al. 1991, p. 287).

- Another important aspect is that, by using EVT, one avoids modelling the entire distribution, which is inefficient and time-consuming (Paolella 2001, p. 1096).

Notwithstanding these aspects, EVT models do *not* usually capture stochastic volatility (McNeil et al. 2000, p. 274).

The central problem coming along with tail-only methods is to determine the right number [illustration not visible in this excerpt] of upper order observations to be included in the sub-sample (Embrechts et al. 1999, p. 326). On the one hand, the number of observations included should not be too small to limit the standard deviation of the tail estimator; on the other hand, including too big a share of the overall sample would lead to an estimation bias since central observations get included. So in case of finite samples, one is confronted with a *bias-variance trade-off* (for details, refer to section 4.1).

Besides imposing an exact [illustration not visible in this excerpt]-stable model, tail inference based on EVT shall be the second focus of this paper.

**Combinations.** Several combinations of the above methods have been suggested:

- **(1, 3, 4)**. One notable example is provided in McNeil et al. (2000, p. 274). A GARCH approach is combined with HS (for the central part of the distribution of the innovations) and EVT (serving the purpose of tail inference for residuals, which are closer to [illustration not visible in this excerpt]-ness than raw data). Capturing the stylised facts of fat tails and conditional volatility, this is a fairly sophisticated model.

- **(1, 4)**. A semi-parametric combination of HS with EVT-based tail inference can be found in Danielsson et al. (2000, pp. 18-23).

- **(2, 3)**. Several combinations of conditional heteroskedasticity models (GARCH or modified GARCH) with [illustration not visible in this excerpt]-stable innovations have already been mentioned. The same has been investigated for exponential or Student-[illustration not visible in this excerpt] residuals (Mittnik et al. 2000, p. 390).

These combinations will not be investigated any further in this paper.

## 3. Technical Background for Tail Inference

### 3.1 Extreme Value Theory (EVT)

This section provides an introduction to the necessary basics of EVT.

Most generally, EVT serves to draw inference about the likelihood of extremal events in the absence of a closed distributional model for the sampling distribution (Hall et al. 1997, p. 1311). It is especially useful when extrapolating beyond the range of available data (*“predicting the unpredictable”*, Embrechts et al. 1999, p. VII).

EVT has been applied in various fields, such as hydrology (Reiss et al. 1997, pp. 233-244), insurance (McNeil 1996, pp. 5-18), and environmental issues (Reiss et al. 1997, pp. 257-264).

In the context of stock returns, one makes use of EVT to meet DuMouchel’s (1983) paradigm of ‘letting the tails speak for themselves’, estimating the tail index from extreme observations exclusively (Danielsson et al. 2002, p. 2).

The first paragraph of this section sets out the central result of EVT, the *Fisher-Tippett theorem*. This is followed by a description how this result can be used to make tail inference using upper order statistics. The third paragraph establishes a modified approach, based on excesses over a high threshold.

**Theorem by Fisher/Tippett and Extreme Value Distributions (EVD).** The central finding one can derive from EVT is that the form of the asymptotic distribution of extreme returns is not dependent on the return-generating process (Longin 1996, p. 384):

Given is a stationary sequence [illustration not visible in this excerpt] of [illustration not visible in this excerpt] RVs with CDF [illustration not visible in this excerpt]. EVT is concerned with the limiting behaviour of the order statistic

illustration not visible in this excerpt

The theory can be equivalently extended to a greater number of upper order statistics (Embrechts et al. 1999, pp. 196-204).

The central result of classical EVT is called the *Fisher-Tippett theorem* or *Extremal Types theorem* (Embrechts et al. 1999, p. 121; Jansen et al. 1991, p. 19), an upper-order analogue to the well-known Central Limit Theorem (CLT, see Bomsdorf 2000, pp. 127-128):

**Theorem 3.A (Fisher-Tippett theorem).** If there exist suitable normalising constants [illustration not visible in this excerpt], [illustration not visible in this excerpt], such that the CDF of the normalised maxima weakly converges to a non-degenerate CDF [illustration not visible in this excerpt], i.e.

illustration not visible in this excerpt

then [illustration not visible in this excerpt] belongs to the *maximum domain of attraction* (MDA) of one of three types of *extreme value distributions* (EVD):

- **Type I:** Gumbel distribution

illustration not visible in this excerpt

- **Type II:** Fréchet distribution

illustration not visible in this excerpt

- **Type III:** Weibull distribution

illustration not visible in this excerpt

It is crucial to recognise that [illustration not visible in this excerpt] converges to one of these distributions, *irrespective* of the actual distribution of the underlying RV [illustration not visible in this excerpt] (Danielsson et al. 2002, p. 6).

A sketch of the proof can be found in Embrechts et al. (1999, p. 122) and Frahm (1998, p. 26). □

Since the index [illustration not visible in this excerpt] governs the tail behaviour of the underlying distribution function [illustration not visible in this excerpt] (Caers et al. 1999, p. 193), it is called the *tail index* (Jansen et al. 1991, p. 19).

**Definition 3.B (Extreme value index).** Let the index [illustration not visible in this excerpt] denote the inverse of the tail index:

illustration not visible in this excerpt

[illustration not visible in this excerpt] is also known as *extreme value index*. □

**Definition 3.C (GEV).** Having introduced this parameter, the three types of EVD can be captured in just one representation (*Jenkinson-von Mises representation*), which yields the *generalised extreme value distribution* (GEV) (Embrechts et al. 1999, p. 152):

illustration not visible in this excerpt

Positive values of x correspond to the Fréchet case, negative values correspond to the Weibull case, and where [illustration not visible in this excerpt], the Gumbel distribution applies. □

Even though no detailed knowledge about [illustration not visible in this excerpt] is required, it is essential to decide which limit law applies, or put differently, whether the extreme value index is positive or negative.

What about dependency? It should be noted that the above results apply to [illustration not visible in this excerpt] maxima in the first place. However, remedies have been developed for situations in which maxima exhibit a dependence structure (Klüppelberg 2002, p. 8). Moreover, it has been shown by Berman in 1964 that the results do *not* change if variables are correlated, provided that the sum of squared correlation coefficients remains finite (Longin 1996, p. 387).

Methodologically, in case of dependency, the normalising constant [illustration not visible in this excerpt] has to be modified by a constant factor [illustration not visible in this excerpt], the extremal index, whereas the tail index [illustration not visible in this excerpt] remains unaffected (Hols et al. 1991, pp. 289-290). For instance, the Fréchet type limit law is modified to:

illustration not visible in this excerpt

Given the frequent observation that larger values tend to occur in clusters, [illustration not visible in this excerpt] is a measure of the degree of clustering of extremes (Ancona-Navarrete et al. 2000, p. 6). Various methods have been developed to estimate the extremal index, the simplest of which are the blocks and runs estimators (see Weissman et al. 1998). A good overview is given by Smith et al. (1994).

Since this problem can essentially be handled independently from tail index estimation in that most results can easily be transferred, estimation procedures for [illustration not visible in this excerpt] can be discussed autonomously (Hols et al. 1991, p. 290).

**Modelling heavy tails with upper order statistics.** How can the stylised facts of empirical stock return distributions be modelled in the context of EVT? Obviously, it has to be identified which limit law corresponds to the phenomenon of heavy tails. Put differently, one asks which (heavy- or thin-tailed) distributions are in the MDA of which of the three types of EVD.

Following Jansen et al. (1991, p. 19), four conditions are defined in order to identify the correct type:

- **Condition 1**. If [illustration not visible in this excerpt] and [illustration not visible in this excerpt]1 for all [illustration not visible in this excerpt], then

illustration not visible in this excerpt

is finite for all [illustration not visible in this excerpt]. This implies that for distributions belonging to the maximum domain of attraction of the Gumbel (type I) distribution, all moments are necessarily finite. The right tails [illustration not visible in this excerpt] decline exponentially.

- **Condition 2.** If [illustration not visible in this excerpt] then [illustration not visible in this excerpt]1 for all [illustration not visible in this excerpt], and [illustration not visible in this excerpt] is finite for [illustration not visible in this excerpt] and infinite for [illustration not visible in this excerpt]. This means that in the Fréchet (type II) case, when weighed by tail probabilities, moments of order [illustration not visible in this excerpt] do not decay rapidly enough to exist. This is the precise definition of fat tails as employed in the context of EVT. One says that the right tails [illustration not visible in this excerpt] decline by a power (*power tails*); they are of Pareto type (Embrechts et al. 1999, p. 133):

illustration not visible in this excerpt

- **Condition 3.** If [illustration not visible in this excerpt] then [illustration not visible in this excerpt] has a finite upper endpoint.

Since the assumption of a finite upper endpoint is not applicable in the context of financial return distributions, the Weibull distribution can be ruled out for stock returns.

How to distinguish between conditions 1 and 2? In order for the Fréchet-type limit law to apply, the following condition is sufficient (Jansen et al. 1991, p. 19):

- **Condition 4.** If [illustration not visible in this excerpt] has *no* finite upper endpoint, and for each [illustration not visible in this excerpt] 0 and some [illustration not visible in this excerpt] 0

illustration not visible in this excerpt

then the tail [illustration not visible in this excerpt] *varies regularly at infinity* and [illustration not visible in this excerpt].

Regular variation at infinity has the following implication (Jansen et al. 1991, p. 19):

**Theorem 3.D.** If the tail [illustration not visible in this excerpt] varies regularly at infinity, then the maxima [illustration not visible in this excerpt] from [illustration not visible in this excerpt] or any finite convolution of [illustration not visible in this excerpt] follow the same limit law. □

Provided that the alleged infinite-variance property actually holds, then extremal stock returns would asymptotically follow a Fréchet distribution. It can indeed be shown that most of the relevant distribution models here satisfy condition 4, i.e. their tails decline by a power and not all moments are finite. This holds true for stable Paretian laws, but also for Student-[illustration not visible in this excerpt] distributions and ARCH-type processes (Longin 1996, p. 387; Jansen et al. 1991, p. 20). All distributions that exhibit fat (upper) tails in the sense of the EVT are nested within the Fréchet case. On the other hand, normal distributions as well as normal mixtures lead to a limit law of Gumbel type.

As for the stable Paretian type, condition 2 is clearly in accordance with [illustration not visible in this excerpt]-stable distributions. Moreover, the tail index coincides with the index [illustration not visible in this excerpt] characterising the shape of the limiting distribution (Kearns et al. 1997, p. 171; Longin 1996, p. 387). It can be shown analytically that, for 0[illustration not visible in this excerpt]2, the stable Paretian distribution belongs to the [illustration not visible in this excerpt] (Frahm 1998, p. 38).

As one might expect from the discussion of suitable return distributions, there is strong empirical support in favour of the Fréchet hypothesis. Longin (1996, pp. 394-400) demonstrates empirically that the distributions of minima and maxima of a selection of the most actively traded American stocks follow a Fréchet distribution.

In accordance with the assumption of a stable Paretian distribution for stock returns, it will henceforth be assumed that the type II limit law (Fréchet case) is applicable to extremal stock returns. This is central to the construction of tail index estimators based on upper order statistics (section 4.3).

**Tail convergence and the Generalised Pareto distribution (GPD).** Apart from analysing the weak convergence of maxima, there is another branch of EVT concerned with the distribution of excesses over a high threshold [illustration not visible in this excerpt]. The so-called *PoT* (*peaks over threshold*) method was first used by hydrologists in the 1970s, and has since then been extended to other fields, such as finance (Embrechts et al. 1999, p. 366).

The appeal of the PoT method is based upon the finding that for sufficiently high levels [illustration not visible in this excerpt], the conditional distribution of the RV [illustration not visible in this excerpt] above [illustration not visible in this excerpt] converges to a *generalised Pareto distribution* (GPD) – regardless of the shape of the underlying distribution of [illustration not visible in this excerpt] (Caers et al. 1999, pp. 191-192). In other words, if a high enough threshold is chosen, the data above this threshold will exhibit Generalised Pareto behaviour. This result was first obtained by Pickands (1975, pp. 119-126). It is called the *Pickands-Balkema-de Haan theorem* (McNeil 1996, p. 7):

**Theorem 3.E (Pickands-Balkema-de Haan theorem).** The distribution function of excesses over [illustration not visible in this excerpt] is given by

illustration not visible in this excerpt

where [illustration not visible in this excerpt] is the (finite or infinite) upper endpoint of [illustration not visible in this excerpt].

One can find a positive measurable function [illustration not visible in this excerpt] such that:

illustration not visible in this excerpt

I.e. the limiting distribution for the distribution of excesses over the threshold [illustration not visible in this excerpt], as the threshold converges to the right endpoint, is given by the GPD (Embrechts et al. 1999, p. 162)

illustration not visible in this excerpt

provided that [illustration not visible in this excerpt] is in the MDA of one of the three types of EVD. □

The parameter [illustration not visible in this excerpt] of the GPD – corresponding to the extreme value index of the GEV – characterises the shape of the tail, whilst [illustration not visible in this excerpt] is the scale parameter. The GPD nests three special cases of limit excess functions (Klüppelberg 2002, pp. 10-11).

**Definition 3.F (PoT model).** Based on theorem 3.E, a standard formulation for the PoT model is (Embrechts et al. 1999, p. 366):

- The points of exceedance over a high threshold [illustration not visible in this excerpt] follow a Poisson process.

- Excesses are independent and follow a GPD.

- Exceedance times and excesses are independent of each other. □

The threshold-based approach can be applied to non-[illustration not visible in this excerpt] data as well (Smith 1987, p. 1177), which is clearly advantageous when dealing with stock return data.

How can this model be used to estimate the tail index? Given that the shape parameter [illustration not visible in this excerpt] corresponds to the inverse of the tail index [illustration not visible in this excerpt], and under the assumption that [illustration not visible in this excerpt]1, one can re-write the GPD as follows (Marohn 1999, p. 414):

illustration not visible in this excerpt

In contrast to other estimation approaches, the tail index estimate is not based upon the relative price changes [illustration not visible in this excerpt] of the return-generating process itself, but based on empirical excesses [illustration not visible in this excerpt] (Frahm 1998, p. 70). Estimates of [illustration not visible in this excerpt] can be derived using empirical values of [illustration not visible in this excerpt], a topic that will be addressed in section 4.4. Different methods will be introduced, such as ML and moment-based estimators (Klüppelberg 2002, p. 11).

However, similar to the problem of choosing the number of upper order statistics to be included in the analysis, the choice of the threshold is problematic (Embrechts et al. 1999, p. 355): If [illustration not visible in this excerpt] were chosen very high, few exceedances would occur, leading to a high estimator variance. On the other hand, if [illustration not visible in this excerpt] were reduced, the estimator would be prone to a bias. Thus, the choice of [illustration not visible in this excerpt] once again involves a bias-variance trade-off.

### 3.2 The stable Paretian model

This section gives a *technical* introduction to stable laws and especially the stable Paretian model. In contrast to section 2.1, the hypothesis of stable-distributed stock returns shall not be questioned at this point.

The first paragraph of this section defines stable laws in general. The stable Paretian model in particular is explained in the second paragraph. In the third paragraph, a precise definition of the domain of attraction of a stable Paretian law can be found. Finally, the fourth paragraph focuses on the tail of the stable Paretian law, explaining the central role of the tail index [illustration not visible in this excerpt].

**Introduction to stable laws.** The families of stable distributions were first studied by Lévy in 1924 (Dostoglou et al. 1999, pp. 57-58). It was found that there are several kinds of stability – and thus alternative schemes that can be called ‘stable’. Therefore, any defini-tion of stability must necessarily be general in the first place:

**Definition 3.G (Stability).** Let [illustration not visible in this excerpt] denote [illustration not visible in this excerpt] RVs, where [illustration not visible in this excerpt] is the log-return on a stock in period [illustration not visible in this excerpt]. Then the general stable probabilistic scheme can be written as

illustration not visible in this excerpt

with [illustration not visible in this excerpt], [illustration not visible in this excerpt], and ‘[illustration not visible in this excerpt]’ standing for either summation, multiplication, minimum, or maximum. The number [illustration not visible in this excerpt] can either be deterministic or random. For each of these pro-cedures, another type of stable law is produced (for an overview, see Mittnik et al. 1993, pp. 266-267). □

As can be seen above, all distributions that obey a stable law are by definition invariant towards [illustration not visible in this excerpt]-fold convolution, i.e. the distribution type is preserved under summation, multiplication, etc.

The stability property implies that all stable distributions have got *one* characteristic parameter (shape parameter) governing the main properties of the distribution – an advantage that can be exploited for statistical inference (Rachev et al. 2000, p. 2).

Apart from the sum-stable law, other categories, such as min-stable or max-stable distributions, have been less extensively applied in finance (Rachev et al. 2000, p. 25).

In this paper, the focus shall be on the *summation-stable* or *stable Paretian* distribution.

**Key properties of stable Paretian distributions.** The stable Paretian distribution is by far the most common stable law:

**Definition 3.H (Stable Paretian model).** A RV [illustration not visible in this excerpt] with distribution function [illustration not visible in this excerpt] follows an [illustration not visible in this excerpt]-stable or stable Paretian law if the type of distribution is preserved under summation and transformation of [illustration not visible in this excerpt] independent copies (Shiryaev 1999, p. 190; DuMouchel 1973, p. 948):

illustration not visible in this excerpt

[illustration not visible in this excerpt] [illustration not visible in this excerpt]

[illustration not visible in this excerpt] is called *strictly stable* if [illustration not visible in this excerpt]0. □

For strictly stable [illustration not visible in this excerpt], the following relationship holds (Müller 1975, p. 227):

illustration not visible in this excerpt

A stable CDF is symmetric if [illustration not visible in this excerpt]. Any symmetric stable CDF is strictly stable (Rachev et al. 2000, p. 27).

The above definition essentially says that if any [illustration not visible in this excerpt] is presumed to follow a stable Paretian distribution, then the same holds for the scale- and location-normalised sum [illustration not visible in this excerpt] (Mittnik et al. 1993, p. 271).

**Definition 3.I (Characteristic function).** Unfortunately, there is no closed-form expression for the PDF of a stable Paretian distribution^{2}, so it can only be described via its *characteristic function* (CF) [illustration not visible in this excerpt] (Mittnik et al. 1993, p. 271):

illustration not visible in this excerpt

with [illustration not visible in this excerpt]1 for [illustration not visible in this excerpt]0, [illustration not visible in this excerpt]0 if [illustration not visible in this excerpt]0, and [illustration not visible in this excerpt]-1 for [illustration not visible in this excerpt]0. □

The CF is the ‘creating’ function behind the PDF and CDF (Müller 1975, pp. 67-68). Its general form for a RV [illustration not visible in this excerpt] can be written as follows (Frahm 1998, p. 84):

illustration not visible in this excerpt

Deriving the PDF is rather time-consuming and complicated. In order to alleviate this problem, Fama et al. (1968, pp. 820-823) present tables of the CDF for different values of [illustration not visible in this excerpt], computed via series expansions. Several other methods have been employed, the most commonly used of which are the Fast Fourier Transformation (FFT) and the Direct Numerical Integration (DNI) method (see 4.6.1). Whilst the latter method is rather complicated, FFT algorithms are more easily implemented in practice, as demonstrated by Mittnik et al. (1999a, pp. 238-239). A brief summary of the FFT method can be found in Mittnik et al. (1999c, pp. 277-278).

As for the parameters of the stable Paretian distribution: [illustration not visible in this excerpt] is the skewness index. If it is positive, the PDF is skewed to the right; if it is negative, it is skewed to the left. For [illustration not visible in this excerpt]1 ([illustration not visible in this excerpt]-1), it is totally right-skewed (left-skewed). For simplification, it is sometimes assumed that the distribution of stock returns is symmetric, i.e. [illustration not visible in this excerpt] 0.

The parameter [illustration not visible in this excerpt] is the parameter of scale. Finally, [illustration not visible in this excerpt] is the location parameter.

The fourth paragraph is devoted to the characterisation of the parameter [illustration not visible in this excerpt], which, due to its central role, deserves most of the attention.

**Definition 3.J (Standard-stable distribution).** For reasons of simplification, the stable Paretian distribution is frequently denoted by (Rachev et al. 2000, p. 27):

illustration not visible in this excerpt

When deriving estimators for [illustration not visible in this excerpt], it is sometimes assumed that one is dealing with a symmetric [illustration not visible in this excerpt]-stable RV (*standard-stable distribution*):

illustration not visible in this excerpt

Where the analysis is focused on assessment of the heaviness of tails, this appears to be a suitable simplification only as long as upside and downside risks need not be treated separately. □

It should be noted that in case of shifted modes ([illustration not visible in this excerpt]0), tail inference can theoretically become more complex (Fofack et al. 1999, pp. 41-42): In the extremal case of [illustration not visible in this excerpt]-1, the lower tail is non-Paretian, i.e. it decays faster than a power.

The CDF of a standard-stable distribution can be approximated via the Bergström expansion as follows, especially when high quantiles are targeted (Frahm 1998, p. 84):

illustration not visible in this excerpt

**The domain of attraction of stable Paretian laws.** Apart from the exact definition of the [illustration not visible in this excerpt]-stable distribution, it is essential to define when a definition can be assumed to exhibit properties that are (asymptotically) similar to those of the exact law. Against the background of empirical studies, it is therefore crucial to be clear about the domain of attraction (DA) of this family.

**Definition 3.K (DA).** The CDF [illustration not visible in this excerpt] is said to be in the DA of a stable Paretian distribution [illustration not visible in this excerpt] if for any sequence [illustration not visible in this excerpt] of [illustration not visible in this excerpt] RVs with common CDF, [illustration not visible in this excerpt] there are sequences of constants [illustration not visible in this excerpt] and [illustration not visible in this excerpt], such that

illustration not visible in this excerpt

with [illustration not visible in this excerpt] being an [illustration not visible in this excerpt]-distributed RV (Mittnik et al. 1993, pp. 271-272; Rachev et al. 2000, pp. 27-28). □

From definitions 3.H and 3.K, it follows immediately that [illustration not visible in this excerpt] belongs to its own domain of attraction. Provided that the CDF of [illustration not visible in this excerpt] belongs to the DA of [illustration not visible in this excerpt], the normalised sum [illustration not visible in this excerpt] converges to a stable Paretian law.

Since, by definition, the DA criterion is asymptotic, it is equally important to know what sample sizes are required to make sure the DA criterion can be employed. DuMouchel (1983, pp. 1023-1026) shows that the rate of convergence can be very slow and many observations are needed to make use of the DA criterion, especially when [illustration not visible in this excerpt] is close to 2.

A more detailed account of the prerequisites for convergence towards a non-normal stable law can be found in Davis (1983, pp. 263-268).

**Tail behaviour and the role of the shape parameter.** The reason why sum-stable distributions are frequently called ‘stable Paretian’ is that it has been known since 1924 (work by Lévy) that these distributions typically exhibit Pareto-like tails (Mandelbrot 1963, pp. 398-399). More precisely, this property can be written as follows (Fofack et al. 1999, pp. 39-40):

**Theorem 3.L (Tail behaviour).** Let [illustration not visible in this excerpt] be a non-normal (0[illustration not visible in this excerpt]2) stable RV with scale parameter [illustration not visible in this excerpt]1, location parameter [illustration not visible in this excerpt]0, and skewness parameter [illustration not visible in this excerpt]-1, then

illustration not visible in this excerpt

where

illustration not visible in this excerpt

That is, if the above conditions hold, then the tails of an [illustration not visible in this excerpt]-stable distribution are asymptotically Paretian. □

As seen in section 3.1, the tails of distributions in the [illustration not visible in this excerpt] decay by a power, i.e. not all moments are necessarily finite. For the class of stable distributions, this results in the following condition (Khindanova et al. 2001, p. 1229; Blattberg et al. 1974, p. 247):

**Theorem 3.M (Existence of finite moments).** The [illustration not visible in this excerpt]-th absolute moment

illustration not visible in this excerpt

is finite if [illustration not visible in this excerpt] or [illustration not visible in this excerpt]2, otherwise infinite. □

Fama (1963, p. 421) himself points out that the non-existence of a finite second moment, i.e. infinite variance, has very important implications. By making the sample variance a *“meaningless measure of dispersion”*, statistical tools based on a finite variance assumption (e.g. OLS regression) are not applicable under the non-Gaussian stable hypothesis.

As demonstrated by the above theorems 3.L and 3.M, the characteristic exponent (tail index, index of stability) [illustration not visible in this excerpt], is the crucial parameter underlying the likelihood of extremal returns, for it governs the *shape of the tail* of the distribution, and therefore its *degree of leptokurtosis* (Fama 1963, p. 422). In other words, the tail index determines how much of the total probability mass is allocated to the tails (Fielitz et al. 1983, p. 28). The smaller the tail index, the heavier the tails. At the same time, it also governs the peakedness of the distribution (Simkowitz et al. 1980, p. 306).

In practice, one would assume that for stable distributed stock returns, [illustration not visible in this excerpt] (Khindanova et al. 2001, p. 1229), i.e. infinite variance but finite mean, in accordance with Mandelbrot and Fama (1963, p. 422). One reason is that values for [illustration not visible in this excerpt] outside this range typically do not find empirical support; the other is that, according to theorem 3.M, expected returns would cease to exist if [illustration not visible in this excerpt]1.

For the cases limiting the range of possible tail indices ([illustration not visible in this excerpt]1 and [illustration not visible in this excerpt]2), the stable family of distribution nests two other models that incorporate a closed-form PDF^{3}:

In the marginal case of [illustration not visible in this excerpt]2 and [illustration not visible in this excerpt]0, the stable distribution coincides with the normal distribution (Khindanova et al. 2001, p. 1229). The CF is reduced to (Fama 1963, p. 422):

illustration not visible in this excerpt

This is the only case in which all moments are finite.

In the case of [illustration not visible in this excerpt]1 and [illustration not visible in this excerpt]0, the stable distribution coincides with the infinite Cauchy distribution (Fielitz et al. 1983, pp. 28-29).

It has become clear that, in the context of stock returns, the stable index [illustration not visible in this excerpt] is a measure of variation and risk of equity prices. In economic terms, the tail index has even been interpreted as a measure of market efficiency (Shiryaev 1999, pp. 335-336). Even though the latter idea shall not be pursued any further, it nevertheless underlines the enormous importance of [illustration not visible in this excerpt].

Estimating this parameter is therefore a crucial task – that will be dealt with in the following sections.

## 4. Estimation of the Stable Paretian Index [illustration not visible in this excerpt]

The following sections are devoted to the estimation of the tail index [illustration not visible in this excerpt], whose meaning and importance have been shown. In the remainder of this paper, two crucial questions shall be tackled:

- Which estimator should be used, given a distributional assumption?

- Given a suitable estimator, what estimates does it yield? Can [illustration not visible in this excerpt] be presumed to be somewhere near 2 or rather smaller?

Sections 4, 5.1, and 5.2 serve to cope with the first problem. After describing and evaluating estimators on the basis of a literature review (section 4), the results of simulation studies for selected estimators are presented in sections 5.1-5.2.

In section 5.3, the second question is dealt with by estimating [illustration not visible in this excerpt] from empirical stock return data.

### 4.1 Desirable properties of an estimator and evaluation criteria

Before looking at specific estimators, one should be clear about the criteria by which they are assessed:

**Bias-variance trade-off.** Traditionally, three desirable properties have been used for the evaluation of an estimator [illustration not visible in this excerpt] (Bomsdorf 2000, p. 160):

- **Bias.** An estimator should be unbiased or asymptotically unbiased:

[illustration not visible in this excerpt] or

illustration not visible in this excerpt

- **Efficiency.** Given all unbiased estimators of [illustration not visible in this excerpt] and a sample size [illustration not visible in this excerpt], an efficient estimator exhibits the smallest variance amongst all unbiased estimators.

- **Consistency.** As the sample size tends towards infinity, the estimator [illustration not visible in this excerpt] shall not deviate from the true value of [illustration not visible in this excerpt], or more precisely: The probability that [illustration not visible in this excerpt] deviates by more than [illustration not visible in this excerpt] from the unknown parameter [illustration not visible in this excerpt] shall approach 0 as [illustration not visible in this excerpt] tends towards infinity:

illustration not visible in this excerpt

Consistency requires an estimator to be unbiased. At the same time, the estimator variance must approach 0 as [illustration not visible in this excerpt] tends towards infinity, i.e. [illustration not visible in this excerpt].

Often, one finds that consistency is imposed as a minimum condition for suitable estimators. Further conditions would require the estimator to exhibit good small sample properties (good rate of convergence) and be (asymptotically) normally distributed for purposes of statistical inference.

In the context of tail index estimation and EVT, however, one is frequently confronted with a phenomenon that makes evaluation rather complicated: the bias-variance trade-off (Wagner et al. 2002, pp. 2-3). When proceeding as suggested above – first find minimum bias estimators and then select the one with the smallest variance – one would typically obtain a sub-optimal result in terms of MSE – for a minimal bias may well lead to an overly high standard deviation (Peng 1998, p. 107; Alves 2001, p. 201). The issue of finding an optimal (or non-dominated) combination between variance and bias becomes relevant at two stages of the estimation process:

**[...]**

^{1} Not to be confused with the ‘tail index’ or the ‘extreme value index’ .

^{2} There are three exceptions which are mentioned in the fourth paragraph.

^{3} Furthermore, for 0.5 and -1, the stable law corresponds to a Pearson-type distribution (Bates et al. (*yns*), p. 1). Yet this case is out of bounds for financial modelling.