# Survival trees - a new method in innovation theory

A successful introduction a method commonly used in survival analysis into the field of innovation diffusion theory

Diploma Thesis 2004 101 Pages

## Excerpt

## Contents

1 Introduction

1.1 Context of Thesis

1.2 Contribution of Thesis

1.3 Structure & Internal Pattern of Thesis

2 Modelling Censored Event Data in the Context of Innovation Adoption- and Diffusion Theory

2.1 Analysing and Forecasting Innovation Diffusion by Dynamic Micro Models

2.2 General Concepts and Terminologies

2.3 Statistical Framework

2.4 Classical Methods for the Analysis of Event History Data

2.4.1 Non-Parametric Methods

2.4.2 Parametric Methods

2.4.3 Semi- Parametric Methods

3 Presentation and Analysis of the Survival Tree Method

3.1 Review of CARTTM

3.2 Principle Framework & Mechanics of Survival Trees

3.3 Splitting, Pruning, Tree-Selection & Alternative Proposals for Survival Trees

3.3.1 Splitting - Growing the Saturated Tree

3.3.2 Pruning - Generation of Optimal Subtree Sequence

3.3.3 Final Tree Selection

3.3.4 Alternative Approaches

3.4 Final Assessment

3.4.1 Assessment of the Splitting, Pruning and Tree Selection Proposals

3.4.2 Merits & Deficiencies of the Survival Tree Method

4 The use of Survival Trees to Forecast Innovation Diffusion

4.1 Applicability of Available Software

4.2 Data Description & Handling

4.3 Implementation

4.4 Results

4.5 Discussion

5 Summary

6 Appendix

6.1 Classification and Regression Tree for E-purchase Adoption

6.2 Cross Table for Sector and Country Coverage

6.3 Variable Description and Handling

6.4 R-Syntax for Survival Tree

6.5 Data Output for E-purchase Survival Tree

6.6 Saturated Survival Tree for E-purchase Adoption

6.7 Original Survival Tree for E-purchase

7 Indices

7.1 Index of Abbreviations

7.2 Index of Symbols

7.3 Index of Synonyms

7.4 Index of Tables

7.5 Index of Figures

8 Bibliography

## 1 Introduction

### 1.1 Context of Thesis

“It is almost universally accepted that technological change and other kinds of innovations are the most important sources of productivity and increased material welfare - and that this has been so for centuries”.^{1}

On the corporate level, the recognition has succeeded that the implementation and maintenance of a successful innovation management is the key contribution to competitiveness and future growth. For this reason, there is great interest in understanding the processes of innovation and its subsequent diffusion to formulate appropriate policies.

Within the last decades, researchers in management and marketing science have greatly contributed to the development adoption- and diffusion theory by suggesting analytical models for describing and forecasting the diffusion of an innovation in a social system. The main reason for this has been the perceived high failure rate of new products and the consequent needs to improve the related management and marketing decisions.

The explanation why firms do not instantaneously adopt new technology immediately after its commercialisation (i.e. diffusion is a time-intensive process) can be traced to different theories of innovation diffusion advocated in literature. According to early epidemic theories of inter-firm diffusion,^{2} diffusion is a disequilibrium process resulting from information symmetries between potential adopters.^{3} In contrast to epidemic models, contemporary approaches to technology diffusion are characterised by the dismissal of information spreading as the key explanatory variable of innovation diffusion.^{4} Rather, models in general assume that firms behave optimally (i.e. are profit maximizers) and that information pertaining to the technological and economic characteristics of the information is perfect. Within this equilibrium approach there are three categories of models that have been developed in the literature: the rank or probit; stock or game theoretic, and order effects models.

In rank or probit models^{5} potential adopters of technology have different inherent characteristics and as a result obtain different gross returns from its use.

The essence of stock effect models is that benefits to the marginal adopter from acquisition decreases as the number of previous adopter’s increases.^{6}

Order effect models are similar to the rank effect models in that the gross returns of a firm adopting a new technology depends upon its position in the order of adoption, with higher order adopting firms achieving a greater return than low-adopters.^{7}

Despite the continuing progress of contemporary approaches, the main impetus underlying diffusion research is still the epidemic Bass model^{8}. Subsuming the majority of other models derived from that model or independently, this model addresses the market in the aggregate. The typical variable measured is the number of adopters who purchase the product by a certain time *t*. The emphasis is on the total market response rather than on the individual adopter. Here, the individual characteristics of potential adopters and their impact on the decision-process remain wholly uncovered. Not the individual who decides, whether to adopt or reject an innovation is central to the analysis, but the time-related distribution of the adoption decision dependent on marketing variables.^{9} These models cannot explain why a particular individual adopts or rejects an innovation at a specific point in time. Consequently, these models achieve no adequate aggregation of individual adoption decisions. Although the specific managerial implications that these models give should not be questioned in general, they remain limited by the aggregate perspective which they take.

In fact, diffusion theory faces a constant dilemma between disaggregate and aggregate diffusion modelling. Although it is unquestionable that the diffusion process is built upon individual adoption decisions, the persuasion that diffusion models should thus be built upon individual decisions has not yet fully materialized.

One reason lies undoubtedly in the substantial modelling obstacles that theory has faced so far in trying to pursue this.

Most models that allow for illumination of individual adoption behaviour are static in nature, hereby failing to capture the inherent dynamics of the diffusion process which makes plausible aggregation nearly impossible. *This dilemma has forced an explicit distinction between adoption- and diffusion theory. Although this distinction is often taken to frame the sort of analysis that is performed, it is forced by the disability of most diffusion models to persuasively incorporate the naturally inherent individual perspective.*

By recognizing that the diffusion process is built upon individual adoption decisions, the adoption theory should be recognized and modelled much more as the key basement of diffusion theory rather than a theory that is conceptionally and in content different to the diffusion theory. The implication of this is that diffusion models that take the individual perspective simultaneously perform an adoption analysis.

Moreover, diffusion models based on individual adoption decisions offer an opportunity to study the actual pattern of social communication and its impact on product perceptions, preferences and ultimate adoption. Nonetheless, first attempts to establish the diffusion process on the basis of individual adoption decisions faced severe problems in realizing ultimate aggregation.^{10} Merely the study by Chatterjee and Eliashberg (1989) provided encouraging empirical evidence for a useful aggregation of individual adoption decisions.^{11} Indeed, it has been recognized only recently that the above described dilemma can be solved.

*So-called event history data is able to capture the dynamics of the diffusion process while, simultaneously, the individual perspective (micro level) can be preserved.*

Eventually, with the introduction of hazard models^{12} into diffusion theory, various micro models were found that could effectively deal with event history data and thus allowed for consideration of individual heterogeneity among adopters by incorporating covariate effects into diffusion models. Up to now, most models that have come up in the widely applied field of event history analysis have been applied to diffusion theory, too.^{13} It should be said, however, that these applications have taken place only recently making the use of event history data still a novel thought to diffusion theory.

The main reason for this may lie in the extent of data collection necessary to perform an analysis. Especially, in economic theory, where the necessity for event history data is not obvious, this may prove a vital obstacle; keeping track of each individual and his adoption decision is undoubtedly a more challenging task than simply taking the aggregate approach. Fortunately, with rising technological possibilities, the applicability of event history models has risen, too.

### 1.2 Contribution of Thesis

With the extension of the non-parametric classification and regression tree method (CARTTM)^{14} to the analysis of censored event data, we are now given the opportunity to move research forward by examining usefulness and applicability of that method for the analysis and forecast of innovation diffusion. The development of the socalled “survival trees” was highly motivated by the need to develop meaningful prognosis rules in medical science.^{15} As will be shown later, there are a number of essential parallels between survival analysis in medical science and diffusion analysis in economics. Emergences of new methods in that field are therefore likely to prove applicable in adoption- and diffusion theory (ADT), too.

As the CARTTM method itself is still new to economic theory, it should not surprise that no known application of survival trees has taken place in an economic context so far. Indeed, even for the CARTTM method only two applications in an economic context are known.^{16} Both methods, CARTTM and survival trees, have been developed in the area of medical science and seem to spread only slowly to other scientific areas. Economists and other non-medical scientists alike will have to be persuaded of the new insights that these methods offer. As for survival trees, this thesis is the first attempt to do this.

The method offers additional insights into causal relations that traditional methods fail to give and can therefore resemble a powerful contribution to modern diffusion theory. Its interpretational power makes it likely that this method will meet widespread acceptance.

### 1.3 Structure & Internal Pattern of Thesis

I want to briefly put into words the structure of the thesis that is already summarized in the table of contents. I believe this will make it more easily understandable and more coherent. Additionally, I find it important that the reader is aware of the internal pattern underlying this thesis. With this, I mean simple formatting or used terminology decisions.

Let us start with the structure: In the course of the thesis, the survival tree method will be introduced within the context of ADT. For this reason, I will provide arguments in favour of dynamic micro models as a means to analyse and forecast innovation diffusion (section 2.1).

As event history data enables us to do this, I will set up the common concepts and ideas of event history data modelling just as the classical methods from this area, all within the context of ADT (2.2, 2.3, and 2.4). This will be done to grasp an understanding of the interpretation and functionality of the event history patterns within the ADT context and is considered essential for understanding the survival tree method and its usefulness in forecasting innovation diffusion.

Survival trees have been derived from CARTTM and consequently both methods share essential conceptual features. After a general introduction into the CARTTM methodology (3.1) and a first introduction in the area of survival trees (3.2), I will attempt to classify the proposals that have come for the construction of survival trees into three building blocks that are commonly used in the construction of CARTTM (3.3).

Subsequently, the various proposals that have come up in the construction of survival trees will be evaluated and the merits just as the deficiencies of the method will be discussed (3.4).

I will describe in detail the software applications available for survival tree calculations to facilitate future work on them (4.1). The data that the method will be applied on is presented and the way data was handled is documented (4.2) before I state which of the various options was taken (4.3).

Analysing the results, we will see whether the method can offer new insights into ADT and whether the previously discussed merits & deficiencies of the method hold true or might have to be reconsidered in the discussed context (4.4).

Eventually, I will discuss the central question about the usefulness of the method to forecast innovation diffusion. I will try to relate the method’s results and their implications to economic practice. Other related issues and thoughts will be discussed, as well (4.5). Conclusively, main patterns and findings of the thesis will be summarized (4.6).

Let me now explain the internal pattern of the thesis relating to measures that were taken to ease functionality and readability of the thesis.

The problem of inconsistent terminology is particularly apparent in event history analyses. If we take, for instance, the denomination “event history data”, we can easily find at least five other denominations, all used interchangeably, which may sometimes hamper understanding substantially. I will thus name these cases when they appear and say explicitly which of the various denominations I will use. Additionally, I have developed an index of synonyms in Appendix 7.3 to prevent any confusion.

Other confusion is likely to be caused by the various denominations in ADT. No definite rule can be established as to whether one should use adoption theory or diffusion theory for a specific field under investigation. In this thesis, I claim that these two areas belong essentially together. I will therefore make no distinction between these two areas using the single denomination adoption- and diffusion theory (ADT) throughout this thesis.

Besides, there is no generally agreed structure in the area as to what model belongs to what class of models and so on. The classification of models into micro and macro, static and dynamic models is by no means generally agreed and was adapted from Litfin (2000).

For easier readability and in order to put emphasize on sentences that I consider vital, I will format respective text bold or *italic*. In this way, words representing

important issues are formatted bold to enable easier localization.^{17} Italic formatting is used for sentences *that I considered vital for overall understanding*.

I have noticed that the literature on survival trees has picked up momentum within the year 2003 and 2004, especially. This made it difficult to incorporate all new literature in the thesis as it was published while this thesis was written. Yet, I think I have successfully attempted to include all literature until the end of November 2004 in the thesis.

Sometimes, I will sum up findings or provide a brief outlook at the very beginning of a section. I do this to make sure one does not lose track of the findings and is always aware of why a certain section was written.

## 2 Modelling Censored Event Data in the Context of Innovation Adoption- and Diffusion Theory

In virtually every area of the social sciences, there is great interest in events and their causes. Criminologists study crimes, arrests, convictions, and incarcerations. Medical sociologists are concerned with hospitalizations, visits to a physician and psychotic episodes.^{18} As a field of economics, innovation theory investigates and tries to predict the effects of innovations on society. Hereby, the adoption decisions of the members of society play the decisive role.

In each of the above mentioned examples, an event consists of some qualitative change that occurs at a specific point in time. Because events are defined in terms of change over time, it has become increasingly acknowledged that the best way to study events and their causes is to collect event history data.^{19} In its simplest form, event history is a “longitudinal record of when events happen to a sample of individuals or collectivities”^{20}.

In this chapter, I will provide reasons why innovation diffusion analysis and forecast should be performed on the basis of dynamic micro models. These models can be established only on the basis of event history data. As all models from the area of event history analysis are either directly or indirectly based on the hazard rate framework, I will establish this framework to ease understanding of the upcoming presentation of the various parametric, semi-parametric and non-parametric models.

For the upcoming introduction of the survival trees, *it is important to understand the conceptionel parallels between diffusion theory and survival analysis*. These parallels allow us to use models coming from the area of survival analysis for ADT.

### 2.1 Analysing and Forecasting Innovation Diffusion by Dynamic Micro Models

“An innovation is an idea practice or object that is perceived as new by an individual or another unit of adoption”^{21}. Commonly speaking, innovation diffusion theory addresses how new ideas, products and social practices spread within society or from one society to another. Moreover, adoption theory analyzes the process of innovation adoption by an individual. Both theories aim to identify explanatory variables that drive and determine the respective process. The adoption process of each individual can differ in starting point and duration. In this way, adoption decisions of members of social systems are spread across time. *Consequently, the adoption theory forms the fundament of innovation diffusion theory and is thus part of it.*

While, by definition, adoption theory is mainly concerned with the exploration of the determinants of adoption, the diffusion theory focuses on the aggregate analysis of all adoption decisions of the members of a social system.

However, by recognizing that the diffusion process is built upon individual adoption decisions, the adoption theory should be recognized and modelled much more as the key basement of diffusion theory rather than a theory that is conceptionally and in content different to the diffusion theory. For this reason, I will make no explicit distinction between these two theories which I claim to belong together.^{22}

The diffusion of an innovation has traditionally been defined as the process by which “an innovation is communicated through certain channels over time among the members of a social system”^{23}. This definition, with its reference to innovation, communication (and the respective communication channels), time and the members of a social system names the four key components widely recognized as driving innovation diffusion. Although the diffusion process is undoubtedly a dynamic process, the majority of the models that have emerged in diffusion theory could only insufficiently capture this essential feature.^{24} Empirical research for analysis and forecast of the diffusion process is still dominated by aggregate diffusion models that mostly envisage capturing the influence of marketing variables on the success of an innovation.

These approaches are convenient in practical terms but they raise the following question: Can a genuine diffusion model be constructed by aggregating demand from consumers who behave in the neoclassical way? That is, assume that consumers are smart and are not just carriers of information? They therefore maximize some objective function such as expected utility or benefit from the product, taking into account the uncertainty associated with their understanding of its attributes, its price, pressure from other adopters to adopt it and their budget. Because the decision to adopt is individual-specific, all potential adopters do not have the same probability of adopting the innovation in a given time-period. Is it possible to develop the adoption curve at the aggregate market level, given the heterogeneity among potential adopters in terms of adopting the innovation at any time *t* ?^{25}

In fact, aggregate models cannot explain why an individual adopts or rejects an innovation at a specific point in time. As a result, these models achieve no adequate aggregation of individual adoption decisions. Analysis and forecast of adoption procedures by means of these models is hardly convincing. While attempts have been taken to unbundle adopters of the aggregate level by categorizing adopters expost into a scheme, they could not eliminate the shortcomings of the underlying assumption of adopter homogeneity.^{26}

The general scheme used for adopter classification is that of Rogers. Rogers divided individual responses to technology into five ideal categories: innovators, early adopters, early majority, late majority, and laggards.^{27} According to him, the main concern of the innovation diffusion research is how innovations are adopted and why innovations are adopted at different rates. Furthermore, he identified five characteristics of innovations that help to explain differences in adoption rates: relative advantage, compatibility, complexity, trialability, and observability. His work has become fundamental to innovation diffusion research and has been documented and quoted in many papers and books.

Although a wide variety of innovations and diffusion processes have been investigated, one research finding keeps recurring. If the cumulative adoption time path or temporal pattern of the diffusion process is plotted, the resulting distribution can generally be described as taking the form of an s-shaped (sigmoid) curve.^{28} The observed regularity in the diffusion process results from the fact that initially only few members of the social system adopt the innovation in each time period. In subsequent time periods, however, an increasing number of adoptions per period occurs as the diffusion process begins to unfold more fully. Finally, the trajectory of the diffusion curve slows down and begins to level off, ultimately reaching an upper asymptote. At this point diffusion is complete.^{29}

*In entrepreneurial reality, information about the process of diffusion is crucial to the success of new product marketing.* If this information is provided on the aggregate level, however, marketing implications are limited. A company will not know whom to target to drive the diffusion process forward. These shortcomings may have let to an unquantifiable waste of resources as companies are likely to have targeted late adopters in the early stages of the innovations market placing and vice-versa. *A tool that can identify crucial target groups at every stage of the diffusion process is seen to be of utmost importance in marketing. So far, there is no method that is capable of providing this insight.*

Besides, the witnessed unilateral reliance on aggregate models may have let to a great number of incorrect diffusion prognoses. The most prominent example of an (ex-post) off beam forecast that was based on an aggregate model is described in a diffusion study by Berndt and Altobelli (1991)^{30}. Other wrong forecasts may prove the insufficient predictive power of these diffusion models.^{31}

In practice, companies need information about target clients and the factors that drive their decisions; something aggregate models cannot provide. This growing recognition has materialized in a mounting demand for rapid integration of micro models to identify and analyse the adoption and diffusion process. Next to the widely used macro models, these micro models can contribute decisively to the analysis and forecast of adoption behaviour and the resulting diffusion process.

Even though the adoption behaviour is nothing but the disaggregated form of the diffusion process, the areas of adoption theory and diffusion theory have been largely separated so far. In fact, not all micro models can be used to analyse and forecast innovation diffusion.

*Generally, all micro models consider the heterogeneity of individuals and allow for the integration of co-variables*. *There is only one type of model, however, that can adequately model censored event data in order to capture the dynamics of the diffusion process. Thus, I claim that only dynamic micro models can be used to forecast innovation diffusion adequately.*

To illustrate this, a comparison between a static model and dynamic micro model will be used.^{32}

If the focus of analysis is on finding out whether a specific individual adopts or rejects an innovation at a specific point in time and what explanatory variables can be identified, then logistic regression is often employed.^{33} This method explains one dependent dichotomous variable through a number of independent variables. Within the framework of ADT, the dichotomous variable can be labelled “adoption of innovation” and “rejection of innovation” always with respect to one specific point in time.^{34} Independent variables could be all sorts of individual characteristics. In ADT one often differentiates between product-, adopter- and environment specific independent variables.^{35}

For logistic regression the usual restrictive assumptions that are known from linear regressions have to be taken.^{36} A violation of these premises can lead to distorted and inefficient estimations for the regression coefficients and eventually to invalid statistical inferences. Here, empirical research is still severely limited by the existence of multicollinearity and autocorrelation between the independent variables. Generally speaking, logistic regression establishes a functional relation between the probability that an event takes place (i.e. an individual adopts the innovation) and a number of predetermined explanatory variables (i.e. independent variables).

In contrast to the linear regression the observable dependent variable, in this case, is not metric, but dichotomous.^{37} Logistic regression quantifies and thus identifies the factors driving or preventing individual innovation adoption. Heterogeneity of individuals is respected and uncovered. Nevertheless, the characteristics of the process itself are not considered at all. With the help of logistic regressions only the result of the adoption process can be revealed. All individuals who have adopted the innovation in between the market placing and the end of the observation period are classified as adopters. Individuals who have not yet or will never adopt the innovation are accordingly classified as non-adopters. There is no differentiation with respect to the adoption’s specific point in time and the future possibility of adoption. *Logistic regression ignores time and thus merely gives a snapshot of adoption behaviour and the diffusion process.*

No valid conclusions can be drawn concerning future market potential, for instance. Despite of this, it is out of the question that with the method elementary relations between adoption decisions and its determinants can be established. Nevertheless, in logistic regressions the duration between market placing and adoption is not taken into account. There is no difference between those individuals who adopt the innovation shortly after market placing and those who adopt shortly before the observation period ends.

Yet, it appears only natural that, by average “early adopters” exhibit a higher likelihood of adoption than “late adopters”. The negligence of this information reduces accuracy and inferential power of static estimations.^{38} Besides, it is the time-related observation of the adoption process, in particular, that enables predictions about future adoption behaviour and thus innovation diffusion. A solution could be the integration of a time-to-adoption independent variable but then one could only consider the individuals who have already adopted the innovation within the observation period. As for the individuals who have not adopted in the period, no time-to-adoption duration can be asserted, as we do not know when and whether they will adopt the innovation after the observation period ends. These observations are “censored”. Censored data can simply be ignored and filtered off the analysis, but this leads to distorted estimates, which is why this approach should be abandoned in the presence of censored data. It is here that the so-called event history models come in.

### 2.2 General Concepts and Terminologies

The general purpose of the analysis of event history data is to explain why certain individuals are at a higher risk of experiencing the event(s) of interest than others.^{39} In general, this can be accomplished by using special types of methods which, depending on the field in which they are applied, are called failure-time models, life- time models, survival models, transition rate models, response-time models or hazard models.^{40} It should be noted, however, that the origin of event history data modelling lies in the area of medical science.^{41} For this reason and for the continuing dominance of survival analysis within the area of event history data modelling, it is not surprising that all of the models that will be introduced shortly have been developed in this area and thus carry respective denominations.

In hazard models the risk of experiencing an event within a short time interval is regressed on a set of covariates.^{42} Two special features distinguish hazard rate models from other types of regression models: They make it possible to include censored observations in the analysis and to use time-varying explanatory variables. Censoring is, in fact, a form of partially missing information: On the one hand, it is known that the event did not occur during a given period of time, but on the other hand, the time at which the event occurred is unknown. Time varying covariates may change their value during the observation period. The ability of including covariates that may change their value in the regression makes it a truly dynamic analysis.

In order to understand the nature of event history data and the purpose of event history analysis, it is important to understand the following four elementary concepts: state, event, duration, and risk period. These concepts are illustrated below using first an example from the analysis of unemployment histories.^{43}

The first step in event history analysis is to define the relevant states which can be distinguished. The states are the categories of the dependent variable, the dynamics of which we want to explain. At every particular point in time, each person occupies exactly one state. In the analysis of unemployment histories, four states are generally distinguished: employment, part-time employment, re-training, and unemployment. The set of possible states is sometimes called the state space.

An event is a transition from one state to another, that is, from an origin state to a destination state. In this context, a possible event is “first employment”, which can be defined as the transition from the origin state, unemployed, to the destination state, employed. Other possible events are: taking a part-time employment or a job re- training. It is important to note that the states which are distinguished determine the definition of possible events. If only the states employment and unemployment were distinguished, none of the above mentioned events could have been defined. In that case, the only events that could have been defined would be becoming employed or unemployed.

Another important concept is the risk period. Clearly, not all persons can experience each of the events under study at every point in time. To be able to experience a particular event, one must occupy the origin state defining the event, that is, one must be at risk of the event concerned. The period that someone is at risk of a particular event, or exposed to a particular event, is called the risk period. For example, someone can only experience to become unemployed when one was employed before. A strongly related concept is the risk set. The risk set at a particular point in time is formed by all subjects who are at risk at experiencing the event open at that point in time.

Using the concepts, event history analysis can de defined “as the analysis of the duration of the non-occurrence of an event during the risk period”^{44}. This duration is usually labelled by the term episode^{45}. When the event of interest is “first employment”, the analysis concerns the duration of non-occurrence of a first employment. In practice, as will be shown below, *the dependent variable in event history models is not duration or time itself but a rate*.

Therefore, event history analysis can also be defined as the analysis of rates of occurrence of the event during the risk period. In the first employment example, an event history model concerns a person’s employment rate during the period that he or she is in the state of never having been employed.

A strong point of hazard models is that one can use time-varying covariates. These are covariates that may change their value over time. Examples of interesting time varying covariates are, in the employment history example, an individual’s financial status or health status. As a matter of fact, the time variable and interactions between time and time-constant covariates are time-varying covariates as well.

We now do have to fit the above described concepts into the area of ADT:

In ADT one generally distinguishes between two states: “adoption” or “non-adoption” of an innovation. The event will be described by the adoption an innovation, which can be defined as the transition from the origin state, non-adoption, to the destination, adoption. This event pattern is called a “single non-repeatable event” where the term single reflects that the origin state, non-adoption, can only be left by one type of event, and the term, non-repeatable, indicates that the event can occur only once. Models that have been developed for this type of event pattern, we will call single risk models^{46}. The duration measures the time until an individual adopts an innovation. Logically, an individual does not necessarily have to adopt within the observation period or further beyond it. Individuals that do not adopt within the observation period and of which we do not when or whether at all they will adopt in the time after produce censored data. I will describe this phenomenon later within the current context.

By and large, this is the sort of event history pattern that is known from the area of survival analysis. In both fields, we observe the duration that lies between some predefined point in time and one single (absorbing) event. In most cases, survival analysis deals with the investigation of the duration between the beginning of treatment or hospitalization and the death of an individual. Ironically enough, both the adoption decision and the death of an individual are single non-repeatable events.

As both processes are equal in terms of their general event pattern, survival models represent likely alternatives for modelling and analyzing adoption- and diffusion processes. It should thus not surprise that all models that will be introduced come from the area of survival analysis. In effect, the vast number of models in survival analysis has been developed to model this type of event pattern. There are, indeed, other alternative concepts, some of which may also be used in the context of ADT. I want to shortly introduce these for a more conclusive introduction.

Sometimes, it may prove necessary or is simply wanted to distinguish between different types of events or risks. In the analysis of death rates, one may, for example, want to distinguish between different causes of deaths. In ADT a distinction between various causes of adoption decisions is equally conceivable.

The standard method for dealing with situations where, as a result of the fact that there is more than one possible destination state, individuals may experience different types of events is the use of multiple risk or competing risk models.^{47}

Most events studied in social sciences are repeatable, and even most event history data contains information on repeatable events for each individual. This is in contrast to medical research and to ADT where the event of greatest interest is death or adoption, respectively. Events of repeatable events could be job changes, having children, arrests, or promotions. In an economic context, the investigation of (repeated) product buying decisions may prove interesting. Often events are not only repeatable but also of different types, that is, we have a multiple state situation.

When people can move through a sequence of states, events cannot only be characterized by their destination states, as in competing risk models, but they may also differ with respect to their origin state and destination states. An example is, once again, an individual’s employment history: An individual can move through the states of employment, unemployment, and out of the labour force. In that case six different kinds of transitions can be distinguished which differ with regard to their origin and destination states.

Hazard models for analyzing data on repeatable events and multiple-state data are special cases of the general family of multivariate hazard models. Another application of multivariate hazard models is the analysis of dependent or clustered observations.^{48} Examples are the occupational careers of spouses, educational careers of brothers, child mortality of children in the same family. Hazard rate models can be easily generalized to situations in which there are several origin and destination states and in which there may be more than one event per observational unit.^{49}

After this general overview to other event history concepts, it is important to stress again that, in the course of this thesis, I will exclusively introduce and apply models for the analysis of single non-repeatable events (single risk models). Moreover, the integration of time varying explanatory variables will not be considered.

*Therefore, I will model the adoption- and diffusion process as having one origin state, non-adoption, and one final non-repeatable event, adoption. Hereby, I will analyse the impact that time-constant explanatory variables have on the dynamics of this process.*

### 2.3 Statistical Framework

Let me now explain the statistical framework of event history analysis that is essential in understanding hazard models regardless of the specific concept chosen.

As such, hazard models have already been introduced into ADT.^{50} Moreover, these models have been used, in an economic context, to analyse and forecast business and firm survival (failure).^{51} In hazard models, no time-point related snapshot of adoption behaviour is analyzed but a time-related observation is established that considers the process characteristics. For this purpose, one needs to know of each individual not only whether an event has taken place but also the duration until the event occurred.

The duration is put into a functional relationship with explanatory variables which can reflect both an individual’s subjective perception of an innovation^{52} just as the individual characteristics of the decision-makers. In contrast to the logistic regression, this approach allows not only to ascertain the adoption probability at a specific point in time but more importantly these probabilities can be determined for each individual *at any point in time*. This enables a more realistic forecast of adoption behaviour. Eventually, by aggregation of the individual probabilities the macro-level can be established hereby illustrating the diffusion process over time.

The process under study (i.e. the adoption process) starts with the market placing of the innovation and ends with the adoption of a sample member at time *t _{i}*. The duration of an episode is represented by a random non-negative continuous variable

*t*for the

_{i}*ith*sample member. This implies that the time-to-event duration is interpreted as the realisation of a random process.

^{53}

As said, the time-to-event duration *t _{i}* depends on a number of explanatory variables. These are combined in the vector

*X i*. The duration of an episode

*t*(

_{i}*t*≥ 0) follows a specific distribution that is represented by the distribution function

_{i}*f*(

*t*) . The respective density function is given by

_{i}*f*(

*t*) . The observation period has the length

_{i}(0, *T* ]

The following relation between the density function and the cumulative distribution function can be established:

illustration not visible in this excerpt

and under the assumption that the density function is continuous:

illustration not visible in this excerpt

and represents the *probability that the ith member experiences (i.e. “ survives ” ) the point in time ti , which is equivalent to the probability that the member has not yet adopted the innovation at this point.*

Dependent on the assumed distribution of *t _{i}* across all members of the sample, there exists a number of differing survivor functions, which all share one feature: All survivor functions fall monotonously as time proceeds. Translated into the adoption context, this means that the probability of no adoption decreases and the probability of adoption increases with time. Furthermore, the survival probability is

*1*for a duration of

*0*and

*0*for an infinite duration. Yet, the process differs in between these two extremes, whereas explanatory variables can have both, an accelerating and a delaying effect on the survival probability. The following relation we get when time is measured continuously:

illustration not visible in this excerpt

The aim of the hazard rate is to quantify the conditional probability (i.e. the risk/hazard) that the event “adoption” has already taken place for the *ith* member at time *t*. As time is a continuous variable, the probability will have the value of *0* at exactly one point in time. For this reason, not a point in time, but a very small time interval *(ti; ti + ti)* is observed. The hazard rate function completely describes the probability distribution of the time until an event.

Furthermore, the condition is made that no adoption took place prior to that time interval. Otherwise, the risk of adoption would be redundant. To prevent that the hazard rate is inflated by the dimension of the time interval, the following measures are taken: First of all, only a small time interval is considered and secondly, the probability is adjusted by dividing it by the dimension of the time space *ti*.^{56}

*Henceforth, the hazard rate can be interpreted as the marginal value of the conditional probability that the adoption takes place within the time interval ( ti; ti + ti ) under the condition that no adoption took place prior to the beginning of the time interval and that the vector Xi is given*.

Note that, in contrast to the survivor function, which focuses on non-adoption, the hazard rate focuses on adoption, that is, on the event occurring. Thus, in some sense, the hazard function can be considered as giving the positive side of the information given by the survivor function, That is the higher *S(t)* is for a given *t*, the smaller is *h(t)* and vice versa.^{57}

If the *ith* member “survives” the point in time *ti*, then the hazard rate informs

approximately about the future process of the probability that the event takes place. The hazard rates can greatly differ in progress. The only restriction is that of nonnegative hazard rates. Choosing an alternative formulation for the density function reveals its similarity to the hazard rate,^{58}

illustration not visible in this excerpt

The only difference between equations (2-5) (“hazard rate”) and (2-6) (“density function”) lies in the restriction; while in equation (2-6) the probability depends merely on the vector Xi, in equation (2-5) the condition that the adoption has not yet taken place before the ti holds additionally.

The hazard rate (2-5), the density function (2-1), and the survivor function (2-3) all constitute equivalent forms to describe the continuous probability distribution of the random variable *ti* in dependence on *Xi*. The relation between the function can be derived from the above equation as follows:^{59}

illustration not visible in this excerpt

Although the process under study is fully described by one of these functions, it should be clear that a distinction between these is useful as every function centres on differing aspects. The hazard rate can be interpreted as the “risk” that an adoption has taken place within the observed time period under the condition that no adoption has yet taken place.^{60} Furthermore, the survivor function provides information about the probability that the sample member survives that time period (i.e. that no adoption takes place within the observed duration). For each member of the sample this information exists at any point in time within the observation period.

**[...]**

^{1} Charles Enquist (1997).

^{2} The economy wide-degree of diffusion can be decomposed into two elements: Inter-firm diffusion and intra-firm diffusion. Inter-firm diffusion describes a firm’s first use of a new technology. Intra-firm diffusion, on the other hand, has not been researched much so far and describes the increasing intensity of technology diffusion. See for literature on inter- and intra diffusion Griliches (1957), Mansfield (1968), Bass (1969) and Hollenstein, Wörter (2004), respectively.

^{3} Baptista (2000).

^{4} Gourlat, Pentecost (2000).

^{5} Ireland, Stoneman (1986).

^{6} Reingannum (1981a, 1981b, 1989), Quirmbach (1986).

^{7} Gourlay, Pentecost (2000), p. 3.

^{8} Bass (1969).

^{9} Albers (1998), p. 13, Kühnapfel (1995), p. 121.

^{10} Hiebert (1974), Stoneman (1981), Feder, O`Mara (1982), Jensen (1982).

^{11} Mahajan, Muller, Bass (1990).

^{12} Kalbfleisch, Prentice (1980), Cox, Oakes (1984).

^{13} Reingannum (1982), Hannan, Mc Dowell (1984, 1987, 19990), Sinha, Chandrashekaran (1992), Gönül, Srinivasan (1993), Caudil et al. (1995), Gourlat, Pentecost (2000), Litfin (2000).

^{14} Breiman et al. (1984).

^{15} Gordon, Olshen (1985).

^{16} Haughton, Oulabi (1993), Köllinger, Schade (2004).

^{17} Bold was used for the authors of the various proposals in 3.3. because their names stand exemplarily for the method they developed.

^{18} Allison (1984) p. 9.

^{19} Alternatively, data is collected as cross-sectional or panel data. For a comparison of these approaches with event history data collection see Blossfeld, Rohwer (2002), pp. 4-6.

^{20} Allison (1984) p.9.

^{21} Rogers (1995), p. 11.

^{22} We will refer to “adoption- diffusion theory” (ADT).

^{23} Rogers (1995), p. 5.

^{24} Litfin (2000), p. 21.

^{25} Following Mahajan, Muller, Bass (1990), p. 6.

^{26} Mahajan, Muller, Bass (1990), p.6.

^{27} See Rogers (1983), pp. 244-245 for a detailed description of the 5 adopter categories.

^{28} The original diffusion research was done as early as 1903 by the French sociologist Gabriel Tarde who plotted the original S-shaped diffusion curve.

^{29} Although the diffusion pattern of most innovations can be described in terms of a general S-shaped curve, the exact form of each curve, including the scope and the asymptote, may differ.

^{30} Berndt, Altobelli (1991) investigated the failure of BTX screens in Germany

^{31} Litfin p. 25.

^{32} Other approaches are of game-theoretical (Reinganum, 1983) or econometric nature (Eliashberg, 1990) , (Jensen, 1982).

^{33} Köllinger, Schade (2004).

^{34} “Rejection“ should not be understood as being definite but rather with respect to one specific point in time (i.e. at point *t 1* the innovation may still be rejected but than be adopted at *t 2*.

^{35} Litfin (2000) p. 25.

^{36} Menard (1995) p.4 ff.

^{37} For a detailed description see Aldrich, Nelson (1984).

^{38} Allison (1995), p.4.

^{39} Andresen, Keiding (2001), p. 4956.

^{40} I will predominantly use the denomination “hazard rate models“.

^{41} Cox (1995), p.4.

^{42} In the context of the survival analysis and event history data, the problem of unobserved heterogeneity /also called selectivity or frailty) has received a great deal of attention. I will not discuss this topic, see Andersen, Keiding (2001), pp. 4956-4962 for more details.

^{43} See for an example of this type of analysis: Heckman, Borjas (1980).

^{44} Andersen. Keiding (2001), p. 4957.

^{45} Also-called spells, waiting time; one should not become confused by the terminology “the duration of an

episode”. “Episode” should thus be seen as a purely technical term. I will predominantly use simply “duration”.

^{46} This intuitive denomination has been chosen although no previous quotes of it could be found. I believe this will make model distinction much easier.

^{47} See Kalbfleisch, Prentice for an extensive overview over these concepts.

^{48} Andersen, Keiding (2001), p. 4960.

^{49} See Andersen, Keiding (2001), p.4961 for a general description of the generalization to be performed.

^{50} Reingannum (1982), Caudil et all (1995), Hannan, Mc Dowell (1984, 1987), Gourlat, Pentecost (2000), Litfin (2000).

^{51} Audretsch, Mahmood (1995), Adretsch (1991), Honjo (2000), Kaufman, Wang (2001), Mata et al. (1995).

^{52} Litfin (2000).

^{53} Logistic regressions and hazard rate models are therefore both stochastic techniques.

^{54} Also-called “survival function”.

^{55} As the hazard rate is applied in various scientific fields, there exist various terminologies for it: e.g. transition rate, intensity rate, mortality rate (see Blossfeld, Hamerle, Mayer (1986, p. 31).

^{56} Allison (1995).

^{57} The interpretational power of the hazard rate as opposed to the survivor function is thus stronger in Adoption and diffusion theory as in survival analysis.

^{58} See Allison (1995), p. 16, Kleinbaum (1995) p. 11.

^{59} See Allison (1995), p.16.

^{60} In this way, larger hazards are directly related to shorter survival or earlier adoption.