# Statistical Techniques for Bankruptcy Prediction

Master's Thesis 2005 100 Pages

## Excerpt

## CONTENTS

Important Symbols and Abbreviations

1. Introduction

2. Bankruptcy Prediction as a Classification Problem

2.1 Bankruptcy Prediction Models

2.2 Structural vs. Reduced Models and Explanatory Variables

2.3 Collinearity Issues

2.4 Sampling Considerations

2.5 Misclassification Costs

2.6 Measures for Model Performance

2.6.1 General

2.6.2 Performance Measures for Models with Probabilistic Output

2.6.3 Rank Correlation Measures

3. Discriminant Analysis

3.1 Discriminant Analysis as a Classification Technique

3.2 Bayesian Approach

3.2.1 Class membership as posterior probability

3.2.2 Assumption of multivariate normality

3.2.3 Distributions other than multivariate normal

3.3 Discriminant Functions Approach

3.3.1 General

3.3.2 Measures for importance of explanatory variables

3.3.3 Testing for mean differences

3.3.4 Importance and significance of discrimination functions

3.4 Stepwise Variable Selection

3.5 Sampling Considerations

3.6 Misclassifications Costs

3.7 Strengths vs. Weaknesses

3.8 Applications for Bankruptcy Prediction

4. Conditional Probability Models

4.1 General

4.2 Microeconomic Derivation

4.3 Model Estimation

4.4 Link to Discriminant Analysis

4.5 Significance Testing

4.5.1 Nested models and hypothesis testing

4.5.2 Test for overall significance of logit coefficients (omnibus test)

4.5.3 Wald test for linear restrictions and t-tests

4.5.4 Lagrange multiplier test

4.5.5 Raftery test

4.5.6 Confidence intervals

4.6 Goodness-of-Fit (GOF)

4.6.1 Mean loglikelihood

4.6.2 Saturated model and deviance

4.6.3 Categorical independents: Pearson and Deviance GOF

4.6.4 Hosmer-Lemeshow GOF test

4.6.5 Box-Tidwell nonlinearity test

4.6.6 Quasi-R²goodness-of-fit measures

4.6.7 Overdispersion

4.7 Variable Selection

4.8 Sampling Considerations

4.9 Extensions

4.9.1 Interaction and quadratic terms

4.9.2 Multinomial models

4.9.3 Mixed logit model

4.10 Collinearity Issues

4.11 Misclassification Costs

4.12 Strengths vs. Weaknesses

4.13 Applications for Bankruptcy Prediction

5. Survival Analysis

6. CUSUM Charts

7. Artificial Neural Networks

8. Some Other Techniques

9. Bankruptcy Prediction Models in Germany

10. Bankruptcy Prediction in Ukraine

11. Summary and Conclusions

Zusammenfassung (Summary in German)

References

## Important Symbols and Abbreviations

Abbildung in dieser Leseprobe nicht enthalten

## 1. Introduction

Bankruptcy prediction has become during the past 3 decades a matter of ever rising academic interest and intensive research. This is due to the academic appeal of the problem, combined with its importance in practical applications. The practical importance of bankruptcy prediction models grew recently even more, with “Basle-II” regulations, which were elaborated by Basle Committee on Banking Supervision to enhance the stability of international financial system. These regulations oblige financial institutions and banks to estimate the probability of default of their obligors.

There exist some fundamental economic theory to base bankruptcy prediction models on, but this typically relies on stock market prices of companies under consideration^{1}. These prices are, however, only available for large public listed companies. Models for private firms are therefore empirical in their nature and have to rely on rigorous statistical analysis of all available information for such firms. In 95% of cases, this information is limited to accounting information from the financial statements. Large databases of financial statements (e.g. Compustat in the USA) are maintained and often available for research purposes.

Accounting information is particularly important for bankruptcy prediction models in emerging markets. This is because the capital markets in these countries are often underdeveloped and illiquid and don’t deliver sufficient stock market data, even for public/listed companies, for structural models to be applied.

The accounting information is normally summarized in so-called financial ratios. Such ratios (e.g. leverage ratio, calculated as Debt to Total Assets of a company^{2} ) have a long tradition in accounting analysis. Many of these ratios are believed to reflect the financial health of a company and to be related to the bankruptcy. However, these beliefs are often very vague (e.g. leverages above 70% might provoke a bankruptcy) and subjective. Quantitative bankruptcy prediction models objectify these beliefs in that they apply statistical techniques to the accounting data.

This work will deal with statistical underpinnings of bankruptcy prediction models based on financial ratios. The statistical theory behind two popular approaches - discriminant analysis and conditional probability models - are discussed in detail. Other approaches are discussed briefly in the context of empirical application to bankruptcy prediction.

The work is organized as follows.

Section 2 formulates bankruptcy prediction problem in terms of mathematical and statistical theory. General issues, common to all statistical models, are also discussed in this section. Among these issues are: selection of explanatory variables (section 2.2), collinearity in explanatory variables (section 2.3), sampling considerations (section 2.4), and allowance for misclassification costs (section 2.5). Special attention is given to the performance measures which evaluate prediction accuracy of models (section 2.6).

Section 3 discusses in great detail the technique of multivariate discriminant analysis, which was historically the first statistical technique to be applied for bankruptcy prediction. Both Bayesian (section 3.2) and discriminant-function (section 3.3) approaches of discriminant analysis are first discussed. Subsequently, the issues of stepwise variables selection, sampling techniques and misclassification costs are addressed (sections 3.4-3.6). Finally, subsection 3.8 gives a review of empirical studies and journal papers which used this technique for bankruptcy prediction.

Section 4 is devoted to conditional probability models, with an accent on dichotomous logit models. Since the mid 1990s, this technique has become the new standard for bankruptcy prediction. After the microeconomic motivation (section 4.2), the model estimation via maximum likelihood method is discussed in section 4.3. Special attention is paid to significance testing and goodness-of-fit measures (sections 4.5 and 4.6). Stepwise variables selection, sampling considerations and misclassification costs are addressed in sections 4.7, 4.8, and 4.11 respectively. Finally, subsection 4.13 gives a review of empirical studies and journal papers which applied this technique for bankruptcy prediction.

Section 5 discusses a promising technique of survival analysis. This technique shows fast growing popularity during the last few years and has a potential of becoming the new standard for bankruptcy prediction. The section also offers a brief overview of empirical studies which used this technique.

Section 6 discusses the technique of statistical control charts. Control charts (in particular CUSUM) have been applied as a dynamic extension of discriminant analysis for bankruptcy prediction. Although this technique should be considered experimental in the bankruptcy prediction context, it has some intuitive appeals.

Section 7 discusses briefly the application of artificial neural networks. This technique has enjoyed great popularity in bankruptcy prediction during the 1990s. However, its performance is rather controversial and the technique is not purely statistical. This is caused by the lack of significance testing theory, although some variations of neural networks seem to overcome this gap.

Section 8 discusses, for the reasons of completeness, some other alternative techniques, which were sporadically applied for bankruptcy prediction.

The empirical aspects in the above sections are based on international studies coming predominantly from the English-speaking world. Section 9 offers an extensive overview of empirical bankruptcy prediction research in Germany. Section 10 discusses issues related to bankruptcy prediction modeling in the emerging market of Ukraine.

Finally, section 11 summarizes the issues discussed in this work also offers a short comparison of the statistical techniques with respect to their suitability for bankruptcy prediction.

To facilitate the readability, in predominantly technical sections (section 2, section 3 excl. subsection 3.8, section 4 excl. subsection 4.13), the passages immediately related to bankruptcy prediction have a box border, similar to that around this paragraph. It was tried to elaborate these passages so that they are understandable even without the technical context. An interested reader might confine himself to reading these passages in the specified sections, if he or she wishes to concentrate on bankruptcy prediction relevance while omitting technical details.

## 2. Bankruptcy Prediction as a Classification Problem

### 2.1 Bankruptcy Prediction Models

Classification can be defined as assigning a group membership to an observation on the basis of this observation’s attributes.

In a classification context, the researcher is confronted with a set of observations (cases). Each observation consists of a number of explanatory (independent) variables (attributes) as well as of one dependent variable, which expresses the class (group) membership. The explanatory variables can be nominal, ordinal or cardinal. The dependent group variable can be either ordinal or cardinal, with its states (levels, alternatives) identifying the observation’s group membership. Let *x* signify the vector *i* of independent variables for the observation Abbildung in dieser Leseprobe nicht enthalten] . Let *y* denote the state of the *i* group membership variable for the observation [Abbildung in dieser Leseprobe nicht enthalten] where *K* stands for the *i* number of groups (classes). Classification model means in this context a model, which would assign *y * on the basis of [Abbildung in dieser Leseprobe nicht enthalten]

Abbildung in dieser Leseprobe nicht enthalten^{3}

The advanced classification models currently in use would, however, first determine the class probabilities, i.e.
[Abbildung in dieser Leseprobe nicht enthalten], for all classes *k * 1, ..., *K*. The class

membership can then easily be established as the class with the highest probability.

The bankruptcy prediction via bankruptcy probabilities should be preferred to direct classification as “bankrupt” and “solvent”. This is because the probabilities approach enables rankings of companies by their financial health - which is much more useful in business applications than simple solvent vs. bankrupt classification. Also, intuitively, bankruptcy can’t be excluded even for very solvent companies; it is just very unlikely to happen to such companies, which will be measured by a small bankruptcy probability.

The classification model will typically be estimated on a training (estimation, insample) dataset (sample) and, possibly, will also be validated on a validation (holdout, out-of-sample) dataset. An alternative to such holdout testing is the so-called jackknife method, which doesn’t require a holdout dataset: here, all observations but one are used to estimate the classification model, which then is used to classify the remaining observation; the procedure is then applied repeatedly for other observations in order to establish the true prediction accuracy.

The validation is desirable because many models are subject to overfitting problem, i.e. the models show high classification performance in the estimation set but low prediction performance for new, unseen data. Jackknife validation is desirable if the original data is scarce and would not suffice for a holdout dataset.

### 2.2 Structural vs. Reduced Models and Explanatory Variables

Generally, if there exists some fundamental theory as to which variables are explaining and influencing the dependent variable, the model should be estimated with all these variables. Models, accounting for such fundamental information are called structural.

Such fundamental knowledge is however often not available, and the fundamental explanatory variables are often not observable. In such cases, the researcher would typically only know that some variables are potentially related to the dependent variable. The number of such potentially related variables will often be high. In these circumstances, for the reasons of parsimony, the researcher would have to conduct a selection from these candidate variables. This results in so-called reduced models.

In some classification approaches (e.g. discriminant analysis), the explanatory variables can be selected automatically, from a predefined initial set of variables. If the initial set contains just a few variables, variables can be selected using full enumeration over all possible combination by maximizing some overall goodness-of-fit measure.

Otherwise, full enumeration becomes prohibitive and so-called stepwise procedures can be used. These procedures sequentially apply significance tests to determine the significance and relative importance of variables to be included or excluded one-by-one. Stepwise algorithms are heuristic in their nature and don’t guarantee that the maximal goodness-of-fit measure will be achieved. It is also important to keep in mind that stepwise procedures capitalize on chance as they pick and choose variables, and thus the final significance levels are worse than the values reported.

Management and economic science have not rendered so far a generally accepted fundamental theory of which factors exactly cause a company to fail and how exactly this happens.

Some models have applied a version of the Black-Scholes option pricing theory to bankruptcy prediction. This structural approach requires however the information about market prices of the companies’ stocks. This information is however only available for public companies, which are listed at stock exchanges. Also, such structural models often show only modest prediction accuracy, reflecting the fact that the underlying theory is only a simplification of reality.

Bankruptcy prediction for private (non-listed) companies rely on financial (accounting) ratios in companies’ financial reports (statements). The accounting ratios are considered to be related to bankruptcy and are often characterized as symptoms of impending bankruptcy. The popular ratios include:

- Profitability Ratios (the two popular choices are: Net Income to Equity, and EBIT^{4} to Total Assets)

- Leverage Ratios (e.g. Total Debt to Total Assets)

- Liquidity Ratios (e.g. Cash to Total Assets)

- Activity Ratios (e.g. Accounts Receivable to Sales)

- Growth Ratios (e.g. Sales Growth)

- Size Measures (the popular choices are Sales and Total Assets)

The above are only examples. There are some 100-200 conceivable financial ratios, built from different combinations of items and entries, which come from financial statements.

There is no established theoretical framework, which would state how exactly financial ratios and bankruptcy are interrelated. Thus, the models constructed for private firms will typically be reduced ones. Also the ultimate models rarely include, for the reasons of parsimony, more then 10 explanatory variables. For this reason, the ratios are normally selected on the basis of stepwise procedures. The selection on the basis of previous studies or economic-theory considerations was also widely used.

### 2.3 Collinearity Issues

Collinearity among explanatory variables, i.e. strong linear interrelationships among these variables, is a common problematic phenomenon in classification context. Various classification methods differ in how sensitive they are to collinearity^{5}. Normally, collinearity only becomes dangerous if the so-called variance inflation factor^{6} (VIF) exceeds 10, a VIF under 4 is considered unproblematic. The VIF is calculated for each independent variable as[Abbildung in dieser Leseprobe nicht enthalten] is the determination coefficient from the regression of that independent variable on other independent variables. Some other sources state that collinearity becomes problematic if two independent variables have a correlation coefficient exceeding 70%.

Generally, collinearity results in unstable (inefficient) parameter estimates. Moreover, collinearity can result in parameter estimates with theoretically wrong sign, which can cause difficulties with interpretation of these parameters (Falkenstein (2000), p. 29). Stepwise variables selection also works poor when collinearity problems persist. Also, collinearity can result in additional model instability: the model performance deteriorates in new, unseen datasets, in particular if the correlation structure changes.

The diagnostics and remedies against collinearity are for all classification methods the same as for a simple ordinary least square regression.

- There are several simple ways to counteract collinearity:

- Independent variables may be replaced by deviations from their means. Other transformations may also help. This should be preferred to other methods as no information gets lost in this case^{7}

- If two independent variables are strongly correlated, one of them may be excluded to avoid collinearity Correlated variables may be orthogonalized, which can be done via replacing an independent variable with residual from its regression on other independent variables. This specially applies to collinearity emerging from interaction terms^{8}.

- Finally, sample size may have to be increased to counteract variance inflation.

If these methods don’t work or cannot be applied, some advanced techniques also exist to counteract collinearity: Principal Component Analysis, Ridge regression, Partial Least Squares, Gram-Schmidt orthogonalization^{9} etc.

Collinearity if an often phenomenon among financial ratios used for bankruptcy prediction, as many of them include common terms in nominator or denominator. This can be seen e.g. at two profitability ratios widely used: Net Income to Total Equity and EBIT to Total Assets. These are not identical but correlated, as Net Income is a part of EBIT and Equity is a part of Total Assets. If simultaneously included in a model, one of the ratio parameters would almost surely turn out to be positively related with the bankruptcy, whereas basic theory says profitability decreases bankruptcy likelihood.

Different methods have been used to counter collinearity. In one approach (Falkenstein (2000)), some 30 ratios were grouped into a few ratio categories such as liquidity, profitability, leverage, activity, growth etc. Then, only one or two ratios from each ratio category were allowed to enter the model. Poppe (2000) used a hierarchical clustering analysis to identify ratio clusters, and took one ratio from each cluster. Another (rather technical) method widely used is to control for multicollinearity during model construction via stepwise methods. This approach is integrated into software packages like SPSS and SAS. The use of principal component analysis in bankruptcy prediction context was rather restricted, as parameter estimates relate in this case to the PCA factors, which are difficult to interpret.

It is important to remember that multicollinearity is problematic first of all if modeling purpose is *explanation*. The *prediction* performance often doesn’t suffer from moderate collinearity^{10}.

In the context of bankruptcy prediction on the basis of financial ratios, explanation doesn’t normally stand in the foreground. This is because most models are reduced ones and purport to empirical prediction rather than causal modeling. This mitigates the effects of multicollinearity in bankruptcy prediction context.

### 2.4 Sampling Considerations

Sampling considerations are of extraordinary importance in classification context. Only rarely will the researcher be able to analyze the complete dataset (i.e. the population). More often, the researcher will deal with a partial dataset (i.e. a sample). Also, often, the complete population might be computationally too large to estimate a model with. The way how the sample is picked from the population affects the model to a great extent. Random samples, when observations are drawn randomly from the populations, is the method of first choice as the sample possesses the same properties as the population. However, often randomly sampled datasets are not available.

Large databases of financial reports (e.g. the Compustat database in the USA) are available for public companies and are based on published information. These can be seen as representing the complete population of public companies.

Private companies however don’t have to publish their financial reports. There exist some databases also for them, but these normally belong to banks or ratings agencies (e.g. Moody's Credit Research Database) and include only information for debtors of one or several banks. Often, these databases can’t be seen as randomly sampled from the total population of private firms. This is because the credit portfolios of different banks might vary in terms of default rates of their debtors, but also in terms of industry, size etc.

Samples stratified by class (choice-based samples), when a fixed number of observations is available for each class, is a possible alternative. Models estimated on stratified samples may however need to be adjusted before applying them for the purposes of prediction.

Even if random sampling is possible, in some circumstances it may be artificially stratified. This is in particular the case then the probabilities of class membership differ considerably across classes. In many classification models (e.g. logit regression, see section 4.8), the goodness of prediction would depend not on the overall sample size, but rather on the sample size of the least frequent class.

Bankruptcy prediction is a typical classification problem where probabilities of class membership will differ much. Annual failure rate in the population (i.e. economy as the whole) doesn’t exceed in most cases 10%, meaning that only 10% of observations, at worst, will belong to the “insolvent” class, the remaining 90% being “solvent”. Under such conditions, the researcher may choose to use all available observations for “insolvent” companies, but only some 30% of the observations for “solvent” class. This will only slightly reduce classification accuracy, but greatly reduce data acquisition demands and computational costs.

Another sampling technique sometimes used in economic research is matched-pairs sampling (also known as case-control technique). With this technique, the researcher would first obtain a small number of observations for the first, less frequent class. He would then choose the same number of the observations from the second class matching them pair-wise by some criteria to the observations of the first class. The observation are then pooled together and used to estimate the classification model. The proportions of classes in this final sample will be distorted, and this should be neutralized by adjustments to the model estimation. The case-control sampling provides more robust, albeit biased, parameter estimates when the number of observations from the less frequent class is small.

Small samples with matched-pairs techniques were initially quite common in bankruptcy prediction research. Many earlier studies included as little as 30 matched pairs of observations (e.g. the seminal Altman (1968)).

The observations are normally matched on the basis of major risk factors: industry and company size. In doing so, the size and industry can’t be used as explanatory variables. In this way, and taking into consideration modest sample sizes, the researcher can concentrate on the classification power of other factors. More specifically, the variance of remaining parameters may be considerably reduced.

When applying matched-pairs methodology, it is important to choose the non-failed companies randomly (using only the matching criteria). This requirement was often violated as researchers often matched failed companied to highly solvent and successful counterparts^{11}. This led to exclusion of intermediate-level companies, which eventually resulted in exaggerated prediction accuracy rates reported by the researches^{12}.

One additional sampling concern in the context of bankruptcy prediction is the so-called survivorship bias. This bias arises because many failing companies often stop submitting or publishing their financial statements 1 or 2 years prior to bankruptcy declaration. The prediction models would normally just omit such observation because of “missing data”. This will result in a situation when successful “survivors” are overrepresented in the estimation sample, possibly leading to biased parameters estimates. A similar bias arises when the database used for estimation contains only companies, which already have survived for a number of years^{13}. This would distort the predictions of such model, especially for newly founded companies.

When a model pertains to prediction rather then to pure classification, it is important to use only those explanatory variables, which are really available before the group membership becomes known.

As most financial reports are published 6 months or later after the balance sheet closing date, and refer to business transactions taking place before this closing date, a typical bankruptcy prediction model will use the data from the year * t*, when it becomes available in the yea[Abbildung in dieser Leseprobe nicht enthalten] , to predict bankruptcies in years[Abbildung in dieser Leseprobe nicht enthalten] etc.

### 2.5 Misclassification Costs

In many applications, the purpose of classification would typically be to minimize the overall misclassification costs rather than misclassification rates. In this work, misclassification costs will be denoted as *C* (*k* | *k*) for selecting the group *k* when the true group is * k*. Misclassification costs can be integrated into most classification models. For more then two groups, however, taking into account classification costs will be rather complicated.

If misclassification costs vary much across classes, they should be integrated into models at any price. An often cited example is prediction of heart-attack on the basis of some risk factors in medical research. If a healthy person is misclassified as endangered, this would have far less serious consequences than an endangered patient misclassified as healthy.

Bankruptcy prediction would be a typical example for highly different misclassification costs. E.g., in a banking loan context, classifying a bankrupt company as solvent would lead to the total or partial loss of principal and interest payments on the loans provided to such company; classifying a solvent company as bankrupt would only mean missing the profit from a successful loan payoff. From this point of view, taking into account misclassification costs might seem indispensable in the bankruptcy prediction context.

Misclassification costs should be however always analyzed along with prior probabilities (priors), i.e. class proportions in the population. The danger of misclassification is not so serious even with high misclassification costs, if the corresponding event is unlikely.

Bankruptcies are much less probable than successful payoffs. Their misclassification costs are higher, as described above. In fact, the two effects eliminate each other to a great degree. This and only this justifies, in part, complete neglecting of misclassification costs (along with priors) from bankruptcy prediction models (Altman (1977)). An additional argument favoring such neglecting is that misclassification costs in bankruptcy prediction context are highly subjective: they can differ depending on the economic purpose of classification.

### 2.6 Measures for Model Performance

#### 2.6.1 General

The prediction accuracy of classification models can be measured by some common statistics, independently of the methods used to build and estimate the models. This gives a possibility to compare different models. We will only look at the model

validation in dichotomous case, i.e. when [Abbildung in dieser Leseprobe nicht enthalten], with[Abbildung in dieser Leseprobe nicht enthalten] for one class and[Abbildung in dieser Leseprobe nicht enthalten] the other.

The simplest way to evaluate the model performance is the so-called contingency table

- a table with columns corresponding to actual group membership, rows corresponding to predicted group membership and cells corresponding to the number of times the respective combination occurred.

In bankruptcy prediction context, the contingency table would typically look like this:

Table 1: Contingency table for Bankruptcy Prediction

Abbildung in dieser Leseprobe nicht enthalten

Type I error (false negatives, FN) would mean classifying a failed company as solvent, and Type II error (false positives, FP) would mean classifying a solvent company as failed. The correctly classified solvent companies are true negatives (TN), the correctly predicted bankrupts - true positives (TP).

Various performance measures derived from contingency table can be used to compare two models. E.g. the hit ratio (HR), calculated as the sum of true positives and true negatives divided by the total number of cases *N*, has been often used:

Abbildung in dieser Leseprobe nicht enthalten

(2.1)

Such performance measures should always be evaluated in comparison with the null model (base-line, random model). The null model is a naïve approach, which would assign all observations to the most frequent class. An appropriate measure is e.g.

proportional reduction in error, also called lambda-p statistic, which is calculated as^{14}:

Abbildung in dieser Leseprobe nicht enthalten

(2.2)

where [Abbildung in dieser Leseprobe nicht enthalten]are the total numbers of observations from groups 0 and 1 0 1 respectively. Another appropriate measure is the Press’s Q statistic (Hair et al. (1998), p. 270):

Abbildung in dieser Leseprobe nicht enthalten

(2.3)

Under the null hypothesis (that the model is not better than chance) this statistic is chisquare distributed with 1 degree of freedom.

#### 2.6.2 Performance Measures for Models with Probabilistic Output

The majority of classification models used nowadays would output not the class membership itself, but ordinal scores of class membership. In most models this scores

will be expressed by the probabilities of class membership^{15},[Abbildung in dieser Leseprobe nicht enthalten] and

[Abbildung in dieser Leseprobe nicht enthalten] . To translate these probabilities into class membership forecast, the researcher would assume a certain cutoff threshold

[Abbildung in dieser Leseprobe nicht enthalten]and assign [Abbildung in dieser Leseprobe nicht enthalten]if

[Abbildung in dieser Leseprobe nicht enthalten] otherwise.

The choice of an optimal cutoff *P * * is however arbitrary to a great extent, as it depends

on such subjective circumstances as misclassification costs and prior probabilities.

Besides misclassification costs and priors, some other circumstances can influence the optimal cutoff as well. Continuing our example of bank loans, it is possible that a bank would only dispose of a fixed overall money amount it can lend to borrowers (this is the case of so-called credit rationing). The bank will then just choose best borrowers, so that the fixed amount is used up. The best borrowers can be determined on the basis of bankruptcy probabilities, but the cutoff point for these probabilities is determined by the fixed credit amount, not by misclassification costs and prior probabilities.

This goes along with the errors tradeoff: if *P ^{*} * is decreased, the type I error would also decrease, but only at the cost of increasing the type II error. This tradeoff between type I and type II errors at different cutoff values can be integrated into two constructs widely used in practice: ROC and CAP curves.

ROC curve (relative/receiver operating characteristic) is constructed by plotting for each possible cutoff point the percent of false positives against the percent of true positives.

Abbildung in dieser Leseprobe nicht enthalten

Figure 1: ROC Curve for Bankruptcy Prediction

The ROC curve depicts (on the x-axis) the percentage of non-defaults which will inevitably be classified as defaults (false positives) against (on the y-axis) the number of defaults predicted (true positives). Source: Stein (2002). FP=false positives, FN=false negatives, TP=true positives, TN=true negatives.

In a bankruptcy prediction context, the ROC curve (Figure 1) answers the question: “What percentage of non-defaulters would a model have to exclude (i.e. classify as defaulter) in order to exclude a specific percentage of defaulters?”. Each point of the ROC curve corresponds thus to some cutoff value used to classify the companies either as bankrupt or solvent.

A null (baseline) model^{16} would correspond on the ROC graph to a diagonal from lower- left to upper-right corner. Generally, the higher the ROC curve, the better is the classification. The area under the ROC curve can be used to generalize prediction performance over all possible cutoff points. An area of 1 would correspond to a perfect prediction model (having no false positives at all). An area of 0.5 would correspond to a random model. As can be shown (Engelmann (2003)), the area under ROC has also a very simple interpretation: it equals the probability that a score of a randomly chosen defaulter will be less then the score of a randomly chosen non-defaulter.

A CAP-curve (Cumulative Accuracy of Prediction) is similar to ROC curve but is constructed by plotting for each possible cutoff point *the sum of* false positives and true positives against true positives. The CAP curve answers the question: “What percentage of all companies would a model have to exclude (i.e. classify as defaulter) in order to exclude a specific percentage of defaulters?”. The difference between ROC and CAP curves is negligible for small percentages of defaults. CAP plot are more usual in applied business and finance prediction models, whereas ROC curves were more intensively applied in the fields of statistics and medicine.

The area under the CAP curve is also often used to evaluate and compare models. In this case, a special measure, called Accuracy Ratio (AR), is calculated as area under the CAP curve minus 0.5 (Sobehart et al. (2000)). The measure is bounded between 0 and 1 and expresses the dominance of a given model over a random model. The standard deviation of the accuracy ratio AR, denoted as *AR*, can be estimated analytically (Engelmann at al. (2003), Escott (2001), p. 24) or via resampling methods (Sobehart et al. (2000)). This estimator ˆ *AR* can then be used to estimate the confidence intervals for AR. More importantly, with the help of this variance estimator it can be statistically tested if one model is significantly better than another model in terms of AR^{17}.

#### 2.6.3 Rank Correlation Measures

On closer expectation, ROC and CAP curves and the area below them is a popularization and visualization of what is known in statistics as association or rank correlation measures. These measures explain the goodness-of-fit in terms of ordering discrepancies between actual classes and predicted probabilities. They use the concepts of concordant and discordant pairs. A pair of observations with different actual classes is said to be concordant (discordant) if the observation with the larger actual class has the lower (higher) predicted probability of the class membership. Let there be a total of[Abbildung in dieser Leseprobe nicht enthalten] pairs with different actual dependent variable, [Abbildung in dieser Leseprobe nicht enthalten] of them being concordant, [Abbildung in dieser Leseprobe nicht enthalten] discordant[Abbildung in dieser Leseprobe nicht enthalten] are thus tied (i.e. have equal predicted probabilities).

being[Abbildung in dieser Leseprobe nicht enthalten] 4 popular rank-correlation statistics are^{18}:

Abbildung in dieser Leseprobe nicht enthalten

(2.4)

Abbildung in dieser Leseprobe nicht enthalten

The higher the above statistics, the more accurate is the classification. The statistics differ in details of how exactly they take into account concordant, discordant and tie pairs. The above *c* statistic corresponds to the area under the ROC curve in case if there are no ties. The last three statistics are all bound between -1 (perfect negative rank correlation) and +1 (perfect positive rank correlation). All these measures can be tested for their significance.

The classical Spearman’s rank correlation statistic can be used as well. However, the above concordance-based statistics are more intuitive, and, due to distributional properties, are easier to test for significance^{19}.

## 3. Discriminant Analysis

### 3.1 Discriminant Analysis as a Classification Technique

(Multivariate) Discriminant Analysis (DA, MDA) is a classification technique, which exploits an assumption of a particular distribution for explanatory variables *x*, *i* conditional on the group membership *y*. The distribution assumed thereby is almost *i* always the multivariate normal distribution. The discriminant analysis was fist derived and applied in 1936 by the British researcher Fisher in terms of the maximal between- groups separation (section 3.3). Later, a link between DA and Bayesian statistics has become evident (section 3.2). We will investigate the two approaches starting with the latter.

Discriminant analysis can be easily formulated for multiple groups ([Abbildung in dieser Leseprobe nicht enthalten] ). However, the discriminant analysis always treats the class variable, including an ordinal one, as if it were cardinal. This is an important shortage of multi-group discriminant analysis as it can result in information loss.

Multi-group (multilevel, multinomial) DA has not been widely used for bankruptcy prediction^{20}. Typically, in this context, 3-4 or more levels (states) would denote increasing/decreasing financial health of a company, e.g.: bankrupt - insolvent - solvent - highly solvent. As the class variable is clearly ordinal in this case, the multinomial discriminant analysis would fail to catch this.

### 3.2 Bayesian Approach

#### 3.2.1 Class membership as posterior probability

In order to deliver predictions, DA makes use of distributional assumptions for the

vector [Abbildung in dieser Leseprobe nicht enthalten], conditioned on the realization of class membership [Abbildung in dieser Leseprobe nicht enthalten]. Let[Abbildung in dieser Leseprobe nicht enthalten] denote the conditional multivariate density of such distribution (the observation index *i* is omitted for simplicity).

Then, by Bayes rule:

Abbildung in dieser Leseprobe nicht enthalten

(3.1)

[Abbildung in dieser Leseprobe nicht enthalten] denotes the so-called prior probability of the class variable outcome *k*. This is the probability which can be assumed without any knowledge of[Abbildung in dieser Leseprobe nicht enthalten] . This is in contrast to the posterior probability Pr(*k* | *x*) *,* which accounts for information contained in[Abbildung in dieser Leseprobe nicht enthalten] .

In the context of bankruptcy prediction, the prior probability would typically denote the overall bankruptcy default rate in the population, and the posterior probability - default expectancy for a particular company.

The observation is classified into the group having the maximum posterior probability, as calculated in (3.1):

Abbildung in dieser Leseprobe nicht enthalten

(3.2)

This is tantamount to maximizing the expression[Abbildung in dieser Leseprobe nicht enthalten] as the bottom term in (3.1) is identical across all classes.

In the special case of independent explanatory variables, the multivariate density[Abbildung in dieser Leseprobe nicht enthalten] can be expressed as a product of marginal densities, and the posterior probability evolves to:

Abbildung in dieser Leseprobe nicht enthalten

where * P* denotes the number of explanatory variables, and

[Abbildung in dieser Leseprobe nicht enthalten]stands for the *l* univariate density of the explanatory variable *l*, conditioned on *y k*.

This case of independent explanatory variables is also often called Naïve Bayes Classifier. Although quite simple in estimation, this classifier often renders very good results.

Naïve Bayes Classifiers are not however offhand suitable for bankruptcy prediction on the basis of financial ratios. This is due to the fact that the explanatory variables would typically be highly correlated with each other. To overcome these correlations, different techniques such as Principal Component Analysis can be used to preprocess data. It is important, however, to remember that only linear dependencies can be usually eliminated this way, possibly leaving nonlinear dependencies among explanatory variables.

#### 3.2.2 Assumption of multivariate normality

We now returning to the general case of (3.1). The conditional density * f* (*x* | *k*) is in most cases unknown and must be estimated from data. In most cases, an assumption of multivariate normal distribution is made:

Abbildung in dieser Leseprobe nicht enthalten^{21}

**[...]**

^{1} The most structural models rely on a variant of Black-Scholes option pricing theory. In these models, the equity is interpreted as a call option on the assets of a company.

^{2} See further examples of financial ratios in section 2.2.

^{3} The terms “solvent” and “healthy”, as well as the terms “insolvent”, “failed”, “distressed” and “bankrupt” are used in this work as synonyms, if not otherwise specified.

^{4} Earnings before Interest and Taxes.

^{5} Conditional probability models (section 4) such as logit regression are e.g. very sensitive to collinearity issues.

^{6} The concept of variance inflation factor was introduced by Belsley et al. (1980).

^{7} See http://www.ats.ucla.edu/stat/stata/webbooks/logistic/chapter3/statalog3.htm

^{8} See http://seamonkey.ed.asu.edu/~alex/computer/sas/collinear_orthogonalization.htm

^{9} See http://seamonkey.ed.asu.edu/~alex/computer/sas/collinear_polynomial.html

^{10} See http://www.math.su.se/~rolfs/Publikationer/Collinearity.pdf

^{11} See Peel and Peel (1987) for a detailed discussion.

^{12} See Zmijewski (1984) for further critics of matched-pairs sampling. See also http://www.rotman.utoronto.ca/bicpapers/pdf/03-05.pdf

^{13} As noted in Falkenstein et al. (2000), p. 49, even the popular Compustat database is subject to this bias.

^{14} See http://www2.chass.ncsu.edu/garson/pa765/logistic.htm

^{15} Models with probabilistic output include: discriminant analysis (section 3), conditional probability (section 4), and survival models (section 5). In contrast to this, control charts (section 6), artificial neural networks (section 7), and decision trees (section 8) normally immediately assign a class membership.

^{16} A random model will assign all observations to the most frequent class.

^{17} The test statistic for this test is calculated as [Abbildung in dieser Leseprobe nicht enthalten] where the lower index signifies the model (1 or 2), and [Abbildung in dieser Leseprobe nicht enthalten] is the estimator of the covariance between the two accuracy ratios. See Engelmann (2003) for details. Under the null hypothesis of equal accuracy ratios, *T* is chi-square distributed with 1 degree of freedom.

^{18} These are offered by SAS package in the LOGISTIC procedure. See http://v8doc.sas.com/sashtml/stat/chap39/sect28.htm

^{19} See a discussion about the Kendall’s tau statistic: http://www.blackwellpublishing.com/specialarticles/jcn_10_715.pdf

^{20} Leker (1993) is a German study which did apply multinomial DA. See section 9.

^{21} Box’s M Test or Levene's test can be both used to test the assumption of homogeneity for covariance matrices. Individual elements of the covariance matrices can be compared as well: the rule of thumb is that

## Details

- Pages
- 100
- Year
- 2005
- ISBN (eBook)
- 9783656965916
- ISBN (Book)
- 9783656965923
- File size
- 792 KB
- Language
- English
- Catalog Number
- v299765
- Institution / College
- European University Viadrina Frankfurt (Oder)
- Grade
- 1,0
- Tags
- insolvency prediction bankruptcy prediction logit probit conditional probabilities discriminant analysis survival analysis control charts artificial neural networks recursive partitioning collinearity sampling misclassification costs variable selection significance testing goodness if fit