Homepage > Katalog > Mathematik - Statistik

Likelihood Method for Randomized Time-to-Event Studies with All-or-None Compliance

Causal Inference in Survival analysis

Forschungsarbeit, 2009

154 Seiten, Note: A

Irene Hudson (Autor:in)

Leseprobe

Table of Contents

Summary 1

1 Introduction

The HIP study 6

2.1

The HIP breast cancer program

Notation, assumptions, basic concepts

3.1

Notation

3.2

Assumptions

3.3

Basic

Concepts

3.3.1 Compliance Types

3.3.2 Causal Effects

3.3.3 Compliers Average Causal Effect

(CACE)

Model Structure and Likelihood

4.1

Compliance

types

distributions

4.2

The Structure of the joint distribution

4.2.1 The joint distribution of T, C, D and Z

4.2.2 The joint distribution of the observable variables

4.3

The

likelihood

function

4.3.1 The idealised likelihood function

4.3.2 Likelihood function for complete data

4.3.3 Likelihood function for incomplete data

Simplification of model parameter

EE 37

EM algorithm

6.1

The Expectation step

6.2

The Maximisation step

Model Specifications

7.1.

Specific non-ignorable PPO survival models

7.2

Parameter of interest

T in specific models

7.2.1

Parameter

T in the NIGN-WW model

7.2.2

Parameter

T in the NIGN-LL model

Computational issues

8.1

Maximisation of the Likelihood Function using the EM algorithm 51

8.1.1 Posterior Probabilities of Compliance type U

8.2

Structures of the Expectation function Q (

T, T

)

8.3

Maximisation of the Expectation function Q (

T, T

) 60

8.4

Estimation of standard errors

8.4.1 The standard error for the estimated model parameters 61

8.4.2 The observed information matric (OIM)

Application

9.1

Pre-processing of the HIP breast cancer data

9.2

HIP data analysis: via non-ignorable censoring mechanism -

PPO-survival models

9.2.1 Model specification for the HIP trial

9.2.2 Estimation of causal effects: CACE(t) and ACE(t) 73

9.2.3 Estimation of relevant survival probabilities 80

9.3

HIP data analysis: via the Frangakis-Rubin method

9.3.1 Frangakis-Rubin (F-R) method

9.3.2 Estimation of relevant probabilities 86

9.3.3 Estimation of causal effects: CACE(t) and ACE(t) 87

9.4

Model comparisons 91

9.4.1 Non-ignorable PPO-survival model vs F-R method

9.4.2 The NIGN-WW model vs IGN-W model

9.5

Results of the analysis via the non-ignorable PPO-survival model 105

9.5.1 Estimated CACE(t) for the HIP

data

105

9.5.2 Estimated CACE(t) for the HIP

data

105

Conclusions and Discussions

107

References 112

Appendix B

118

Appendix C

137

Appendix D

145

Appendix E

148

Likelihood Method for Randomized

Time-to-Event Studies

with All-or-None Compliance

Zhaojing Gong, Irene Hudson, Patrick Graham

SUMMARY

Estimating causal effects in clinical trials often suffers from treatment

non-compliance and missing outcomes. In time-to-event studies, it is more

complicated because of censoring, the mechanism of which may be non-

ignorable. While new estimators have recently been proposed to account

for non-compliance and missing outcomes, few papers have specifically con-

sidered time-to-event outcomes, where even the intention-to-treat (ITT)

estimator is potentially biased for estimating causal effects of assigned treat-

ment. In this paper we develop a likelihood based method for randomized

clinical trials (RCTs) with non-compliance for time-to-event data and adapt

the EM algorithm according to derive the maximum likelihood estimators

(MLEs).

In addition, we give formulations of the average causal effect

(ACE) and compliers average causal effect (CACE) to suit survival anal-

ysis. To illustrate the likelihood-based method (EM algorithm), a breast

cancer trial data (Baker, 1998) was re-analyzed using a model, which as-

sumes that the failure times and censored times both follow Weibull and

Lognormal distributions, respectively (termed the NIGN-WW model and

NIGN-LL model).

keywords: Causal inference; Noncompliance; Survival analysis; Maximum

likelihood; EM algorithm; Weibull distribution; HIP breast cancer trial; CACE;

ITT analysis.

INTRODUCTION

Estimating causal effect of interventions is often required for empirical studies

in medicine. Randomized clinical trials (RCTs) are only generally approach for

causal inference, but it often suffers from noncompliance. Noncompliance is a

common complication in the analysis of randomized clinical trials (RCTs). It can

take many forms (Baker, 1997), such as switching from the assigned treatment to

another treatment, or not attending all scheduled visits to receive treatment. The

main feature of RCTs with noncompliance is that not every patient in each group

of the RCT receive the same treatment. All-or-none compliance is a special type of

noncompliance, in which switching occurs immediately after randomization, and

every patient either does or does not comply with their assigned treatment com-

pletely, no partial compliance is assumed. All-or-none compliance is of particular

interest because it greatly simplifies modeling and has received more attention

over the last decade. (Imbens & Rubin, 1997; Baker, 1998; Frangakis & Rubin,

1999; Hirano & Imbens, 2000; Jo, 2002; Loeys & Goetghebeur, 2003; Levy et al.,

2004; Peng et al., 2004; O'Malley & Normand, 2005; White, 2005; Gong et al.,

2005).

Besides noncompliance, censoring is another problem has to be dealt with in

many randomized clinical trials, since in medicine studies, the outcomes of interest

most likely are time of event of interest. This kind of data is well known as time-

to-event data, also called survival data. It is a special type of data collated in

long-term follow-up studies, in which the outcome of interest is a time of a specific

event, usually treated as "failure time "or "survival time ".

Time-to-event data often appears in the areas of medicine and health study

(survival time), as well as in economics and engineer industry fields (failure time),

usually time to death or the manifestation of a symptom or occurrence of a dis-

ease. Subjects who may not experience the event of interest during the period of

study (observation), or may withdraw from the clinical trial for some reasons are

considered censored. In any one of these cases mentioned above, subjects´survival

time (time of event) will not be observed, this is given in terminology as censoring.

The standard method of analysis of the time-to-event data is so-called Survival

Analysis, in which censoring is the main problem need to be faced.

The standard analysis of a RCT with noncompliance is an intention-to-treat

(ITT) analysis (Angrist et al., 1996; Imbens & Rubin, 1997), which compares out-

comes of subjects by assigned treatment, ignoring the actual treatments received.

ITT only provides an estimator in terms of the causal effect of assigned treatment

(denoted by Z, of an outcome of interest). It has been denoted by ACE in terms

of Average Causal Effect (Holland, 1986; Graham, 2000, 2001) or IT T (Imbens

& Rubin, 1997; Loeys & Goetghebeur, 2003; O'Malley & Normand, 2005). ITT

analysis is a very plausible method for causal inference (Holland, 1986; Pearl,

2001; Greenland, 2004; Rubin, 2005), as it avoids those biases that associated

with estimating the causal effect "as-treated "(comparing outcomes of subjects

by received treatment) or causal effect "per protocol "(comparing outcomes of

subjects who comply with their assigned treatment by assigned treatment) (?).

ITT works well only when the assigned treatment and received treatment are

identical for all patients in the trial, at least if most patients satisfy this as-

sumption, otherwise, a biased estimator of causal effect may be produced (Loeys

& Goetghebeur, 2003; Hirano & Imbens, 2000; White, 2005; Baker & Kramer,

2005).

Amongst an extensive body of the literature on noncompliance, most papers

are address noncompliance as missing covariates. Some recently published papers

consider noncompliance with missing outcomes together (Baker et al., 2000; Hi-

rano & Imbens, 2000; Barnard et al., 2003; Peng et al., 2004; Mealli et al., 2004;

O'Malley & Normand, 2005; Baker & Kramer, 2005), however, they are all on

binary or dichotomous outcomes or normal distributions data, rather than the

time-to-event data.

For time-to-event data, to address the noncompliance and missing outcomes at

the same time is a big challenge for standard survival analysis. As in general, sur-

vival analysis follows the assumption of Non-informative censoring (Kalbfleisch

& Prentice, 2002; Klein, 1997; Cox & Oakes, 1984), which assumes that censoring

mechanism is ignorable. Implying that subjects censored at time t do not con-

tribute to the likelihood function. But when compliance is imperfect in a clinical

trial, its censoring mechanism is more than likely to be non-ignorable (Baker, 1994,

1998). This means that the conventional methods of survival analysis can not be

applied simply for analysis those time-to-event data from randomized clinical trial

with noncompliance.

Therefore, to develop a new approach for analyzing causal effects for time-to-

event data from RCTs with noncompliance and missing outcomes, is necessary

and vital (Fleming & Lin, 2000; Fisher, 1999).

In survival study area, to the date, we only find a few of papers are related

to noncompliance issue (Robins & Tsiatis, 1991; Baker, 1994, 1998; Frangakis &

Rubin, 1999; Loeys & Goetghebeur, 2003; Herring & Lipsitz, 2004; Gong et al.,

2005). But none of them are focused on general likelihood method with considering

noncompliance and and non-ignorable simultaneously.

Robins & Tsiatis (1991) focus on structural failure time model; Baker (1994,

1998) are based on cause-specific hazards; Frangakis & Rubin (1999) is based

on the Kaplan-Meier estimator (Cox & Oakes, 1984; Klein, 1997; Kalbfleisch &

Prentice, 2002). Herring & Lipsitz (2004) Loeys & Goetghebeur (2003) and Gong

et al. (2005) are based on Cox model Cox & Oakes (1984). Some of them, such as

Loeys & Goetghebeur (2003); Gong et al. (2005) are still under the assumption

of non-informative censoring.

In this paper, we propose a new approach based on the consideration of non-

compliance and missing outcomes together, which is like the circumstance consid-

ered in Frangakis & Rubin (1999), and focus on likelihood function rather than

using survival estimators; as uncompleted covariate is included, EM algorithm is

adapted, to the author's knowledge, this is the first study to date to develop like-

lihood based (EM) approach t survival data RCT's accommodating both missing

outcomes and noncompliance simultaneously.

To illustrate the plausible of proposed likelihood based approach, two specific

Parametric Potential-Outcome (PPO) Survival models, namely the NIGN-WW

and the NIGN-LL models, have been given and applied for re-analysis a time-to-

event data (breast cancer data)from randomized clinical trial conducted by Health

Insurance Plan (HIP) of Greater New York during the 1960s. For comparison,

the method introduced in Frangakis & Rubin (1999), which can be viewed as a

nonparametric method for imperfect compliance survival data, is also implement

on computer and applied for the HIP data.

The rest of the paper has a structure as following: Section 2 gives the back-

ground of the HIP data; Section 3 defines notation used in this paper and as well

as assumptions required for our PPO-survival model plus some basic concepts

which are important for establishing our PPO-survival model; Section 4 provides

the likelihood structure; the PPO-survival model parameter is specified in Section

5 and Section 6 is about the EM algorithm. Seall results of re-analysis of HIP

data with different methods are given in Section 9. Conclusion and discussion are

given in Section 10. All related equation of computations are given in Appendix,

as well as tables and figures.

The HIP study

2.1

HIP breast cancer program

The breast cancer data used in this paper was collected from an breast screening

study in USA conducted by the Health Insurance Plan (HIP) of Greater New

York during the 1960s.Participants were those women who enrolled in the HIP for

one year or more when the program started in 1963 with asymptomatic of breast

cancer and ages between 40 and 64.

In total 60696 eligible women were included in this program, and all were

randomly assigned into either the "screening "or "control "group. Women in

the "screening "group were offered free annual breast examinations four times.

These examinations consisted of film mammography (cephalocaudal and later

views of each breast); a clinical examination of the breast by a physician (usually

a surgeon), an interview for demographic, other background information and a

health history. While the women in "control "group only received their usual

health care. All participants in this study were followed up in the long-term up

to 18 years from their date of entry. Mortality due to breast cancer was the

main outcome variable in the HIP study, and was used to evaluate the efficacy

of the screening program (Shapiro et al., 1988; Baker, 1998). Therefore the event

of interest was death from breast cancer and the outcome of interest was the

observed time (in years) for each individual from entry to death by breast cancer

or loss to follow-up. For those women diagnosed with breast cancer in the study

there was no loss to follow up.

Approximately one-third of women who were assigned to the

"screening

"group refused the offer of a screening examination. Our analysis assumes no

women in the "control "group received "screening " this assumption has been

given by Frangakis & Rubin (1999); Loeys & Goetghebeur (2003) in terms of No

access of treatment for the control group.

We also assume "All-or-none compliance "meaning that we assume women

who took up the offer of screening, complied fully with the screening programme

protocol by attending (at least one of) all scheduled screening visits whereas

women who initially declined to participate in the programme are assumed not to

have attended any screening visits.

Previous research on the HIP breast cancer data (Shapiro, 1977; Shapiro et al.,

1982, 1988) was mostly based on cumulative breast cancer's mortality rate and

survival rate over particular fixed follow-up durations rather than on estimating

survival or hazard functions. In addition, with the exception of Baker (1998)

Richardson and Wells (1995?).

The previous analysis of the HIP by Shapiro (1977); Shapiro et al. (1982, 1988)

have ignored compliance information and estimated the intention-to-treat (ITT)

effect of assigned treatment.

Non compliance presents a challenge to traditional statistical methods for the

analysis of RCTs because neither an analysis which ignores non-compliers, nor an

analysis which focuses on treatment actually received can provide [a precise and

reliable answer] valid inferences for on the effect of mammography on breast cancer

mortality. In the former case, the observed outcomes for the intervention group

are a mixture of outcomes for women receiving and not receiving the intervention

In the latter case, the two groups are not necessarily comparable, because the self-

selection of treatment for the intervention group means that treatment received

is not randomized and consequently treatment groups may differ on background

factors related to breast cancer and mortality risk.

The summary statistics of HIP data is given in Table 2.

able

Primary

notation.

oten

tial

and

Observ

able

data

erm

Data

for

sub

ject

Assigned

treatmen

terv

tion

con

trol)

Compliance

complier

nev

er-tak

er,

and

alw

ys-tak

er)

oten

tial

data

Observ

able

Data

Receiv

treatmen

(1)

(0)]

obs

)+(

(0)

ailure

time

(1)

(0)]

obs

)+(

(0)

Censored

time

(1)

(0)]

obs

)+(

(0)

Censored

indicator

(1)

(0)]

obs

)+(

(0)

Observ

time

min

),C

))

)+(

(0)

opulation

parameters

arameter

Notation

Subp

opulation

del

parameter

,uz

u,z

)(

)

Surviv

probabilit

)(

)

Prop

ortion

All

Prop

ortion

(Z,

)

All

Data

summaries

Statistics

Notation

Subsample

Sample

size

(Z,

)

,d,

)

Mean

observ

time

¯ Y

(Z,

)

Prop

ortion

¯ R

(Z,

)

,d,

)

able

Summary

statistics:

HIP

data.

otal

Screen

vited

Con

trol

Screen

receiv

Screen

refused

Screen

receiv

Sample

size

60,696

30,131

30,565

20,147

9,984

40,549

patien

2,325

1,171

1,154

835

336

1490

Deaths

786

372

414

247

125

539

Mean

observ

time

¯ Y

14.051

14.1578

13.9457

14.905

12.6499

13.6266

Prop

ortion

¯ R

0.0129

0.0123

0.0135

0.0123

0.0125

0.0133

Prop

ortion

¯ Z

0.4964

0.2462

Prop

ortion

¯ D

0.3319

0.6686

ccurs;

randomly

assigned

the

screening

group;

receiv

the

screening

examination.

Notation, assumptions and basic concepts

3.1

Notation

We develop our model under the same randomization conditions considered in

Frangakis and Rubin (1999). First, a sample of N individuals is randomly sam-

pled from a large population of individuals all of whom are eligible to participate

in the study, and could therefore be assigned to any of the possible treatment

levels. Under this set-up it meaningful to consider contrasts between outcomes

under alternative treatment assignments for the population from which the study

sample is assumed to have been obtained. A randomized controlled trial is con-

structed by randomly allocating each of the N sampled individuals to a treatment

group. Our aim is to make inferences about the population causal effects of treat-

ment assignment, and of treatment received on the basis of the data from the N

individuals enrolled in the study.

We adopt a potential outcomes model (Neyman 1923, Rubin, 1978) for sur-

vival times under alternative treatment assignments.

Let Z

= z denote the

assigned treatment (z = 1, for invitation of screen and z = 0 for control); and

let T

(1) denote the failure time for the ith individual in the population if as-

signed treatment 1 (for example, invitation for screening) and let T

(0) denote the

failure time for that same individual if assigned to treatment 0 (not invited for

screening). Similarly let C

(1) and C

(0) denote potential censoring times corre-

sponding, respectively to treatment assignments 1 and 0. The censoring times are

the times at which an individual will be censored. As in Imbens & Rubin (1997),

our model also requires indicators for the treatment received under the alterna-

tive treatment assignments,. Thus D(1) denotes treatment actually received if

assigned to treatment, 1, and D(0) the treatment actually received if assigned to

treatment 0. In the potential outcomes models observable data is represented as

particular elements of the potential outcome vectors, and the elements of these

vectors actually observed, reflects the treatment assignment. Thus if Z

= 1, the

observable treatment received indicator is D

) = D

(1), the observable failure

time is T

) = T

(1) and the observable censoring time is C

) = C

(1), and

similarly if Z

= 0. Note that although we refer to T

) and C

) as 'observ-

able', at most one of these variables is fully observed. If an individual is, in fact,

censored then C

) is observed and T

) is only partially observed in the sense

that it is known that T

) > C

). On the other hand if the failure time for

the ith individual is observed the censoring time is only partially observed and

known to be greater than the survival time. While it is standard in much applied

work on survival analysis to structure the analysis in terms of the observable min-

imum of survival and censoring time and a censoring indicator, such a set-up is

valid only under the assumption of ignorable censoring,which is equivalent to the

assumption of the independence of failure time and censored time given in section

3.2 in Klein (1997).

Because our analysis allows for non-ignorable censoring we work directly with

the more fundamental failure time and censoring time variables.

More details of notation used in this paper are given in Table 1.

3.2

Assumptions

Before giving the structure of our PPO-survival model, we list all assumption

required first which most follow those assumptions given in Frangakis & Rubin

(1999).

1. The stable unit treatment value assumption (SUTVA) (Rubin, 1986; Imbens

& Rubin, 1997; Peng et al., 2004): this assumes that the potential outcomes

of each individual are unrelated to the treatment status of other individuals.

2. Random assignment: this assumes that each individual has been assigned

randomly to a treatment group.

3. Exclusion restriction: this assumes that the assigned treatment has an effect

on the outcome only through its effect on the received treatment.

4. Monotonicity: the actual received treatment D

(z) is a monotone function

with respect to the potential treatment levels, 1 for being in the treatment

group and 0 for being in the control group.

5. Latent ignorability: this assumes that the potential outcomes (of failure

time) and the associated potential non-responses (censoring time) are inde-

pendent at each level of the latent compliance types.

In the literature, the latent ignorability assumption for general data from RCTs

has been expressed as follows:

P r(Y

, R

, Z

) = P r(Y

, Z

)P r(R

, Z

);

(3.1)

where Y

and R

represent the potential outcome variable and the response indi-

cator for the ith individual, respectively; the notation for U

and Z

are the same

as in Frangakis & Rubin (1999), Mealli et al. (2004) and O'Malley & Normand

(2005). U

denotes the compliance type, and Z

indicates the assigned treatment.

The analogous assumption for time-to-event data is:

P r(T

, C

, Z

) = P r(T

, Z

)P r(C

, Z

(3.2)

where T

and C

denote the potential outcome of failure time and censoring

time, respectively. This assumes that in RCTs with time-to-event outcomes, the

potential failure and censoring times are independent conditional on the latent

compliance type. We regard equation (3.2) as the time-to-event version of the

latent ignorabilty assumption given by Frangakis & Rubin (1999). Note that in

this chapter, we use the bold symbols, such as T

and C

, to denote potential

outcome vectors.

3.3

Basic concepts

For convenience of understanding the model structure given in Section 4, we in-

troduce a few basic concepts in the following.

3.3.1

Compliance types

In literature, a variety of names that have been used to describe individuals' com-

pliance behaviour, such as compliance type (Imbens & Rubin, 1997), compliance

state (O'Malley & Normand, 2005), or principle strata (Frangakis & Rubin, 2002;

White, 2005). This is shown that individuals' compliance behaviour has drawn

much attention from many authors, as doing noncompliance analysis, individuals'

compliance behaviour in RCTs is a very important issue that has to be concerned.

In this paper we follow the work of Imbens & Rubin (1997) and refer to this new

variable as the compliance type and denote it by U.

The definition of compliance type given by Imbens & Rubin (1997) is as follows:

1. Compliers (c) are those individuals who receive the treatment when they

are assigned to the treated group. They would not receive any treatment

when assigned to the control group. That is, compliers always comply with

their assigned treatment Z

2. Always-takers (a) are those individuals who always receive the treatment

regardless which groups they are assigned to. That is, always-takers only

comply with their assigned treatment when Z

= 1.

3. Never-takers (n) are individuals who would never receive the treatment re-

gardless which group they are assigned to. That is, never-takers only comply

with their assigned treatment when Z

= 0;

4. Defiers (d) are individuals who receive the treatment when they are assigned

to the control group and would not receive the treatment when they are

assigned to the treatment group. That is, defiers never comply with their

assigned treatment Z

= z no matter whatever value Z

has.

Using the symbol of potential received treatment D

= [D

(1), D

(0)], these

four compliance types can be written mathematically as follows,

(i.e. subject i is a complier)

if D

(1) = 1 and D

(0) = 0;

(i.e. subject i is a never-taker)

if D

(1) = 0 and D

(0) = 0;

(i.e. subject i is an always-taker)

if D

(1) = 1 and D

(0) = 1;

(i.e. subject i is a defier)

if D

(1) = 0 and D

(0) = 1.

(3.3)

This compliance type variable, U

, allows us to classify a population into four

subpopulations by the four categories of individuals' compliance behaviour, that

is compliers (U

= c), always-takers (U

= a), never-takers (U

= n) and defiers

= d).

In the literature, the fourth compliance type, defier (U

= d), is often omitted,

since there is no reason to suppose that individuals who agree to participate in

a trial will necessarily always counter their experimenters. To avoid this extreme

situation, a very common assumption named Monotonicity (Angrist et al., 1996;

Imbens & Rubin, 1997; Frangakis & Rubin, 1999) is given, which supposes that

the component of the received treatment D

is a monotone function with respect

to Z

, that is D

(1) and D

(0) have a relationship as follows,

(1)

(0).

(3.4)

Instead of using the terminology of Monotonicity, some authors directly as-

sume, however, that there are strictly no defiers in the study (Baker, 1998; Baker

& Kramer, 2005).

3.3.2

Causal Effects

Using the potential outcomes notation it is straightforward to define causal ef-

fects of treatment assignment, for example T

(1)

- T

(0) is the the causal effect

of treatment assignment for the i

individual. Within the potential outcomes

framework it is conventional to define causal effects contrasts between population

averages under alternative treatment assignments (Rubin, 1978, Imbens & Rubin,

1997, Hirano et al, 2000). However for failure time data the full survival curve

is potentially of interest and we therefore define causal effects in a time depen-

dent manner, contrasting population survival curves under alternative treatment

assignments.

Let S(a, t) denote the population survival curve under assignment to treatment

a. This is the survival curve, defined as Pr(T (a) > t), that would be observed

if all members of the population were assigned to treatment a and followed until

failure. Here the probability notation, Pr() can be interpreted as the population

proportion or as a relative frequency under repeated sampling from the population.

For any pair of treatment levels labeled, for convenience, 0 and 1, , an average

causal effect of treatment assignment at time t, can be defined as

ACE(t) = S(1, t)

- S(0, t)

(3.5)

Thus the population survival curves induce a population - level causal effect

curve. This time varying notion of causal effect can be related to time-independent

definitions of average causal effect by considering, for, any time t, the indicator

variables I(a, t) defined to equal 1, if T (a) > t and to equal 0, otherwise. For

the ithindividual, the causal effect of treatment at time, t, is then ICE

(t) =

(1, t)

- I

(0, t) and the average causal effect is the average of these individual

effects effect at time t, can then be written

ACE(t) = E (I(1, t)

- I(0, t))

= S(1, t)

- S(0, t)

(3.6)

3.3.3

Compliers average causal effect

Because the treatment received by always takers and never takers is not affected

by treatment assignment the subpopulation average causal effect functions for

these groups do not reflect the causal effect of treatment. On the other hand, the

sub-population average causal effect functions for compliers and defiers are of of

interest because these effects may reflect the effect of treatment. For example the

Complier Average Causal Effect (CACE):

CACE = S

(1, t)

- S

(0, t)

(3.7)

contrasts survival under alternative treatment assignments for a sub-population

know to comply with these assignments. It is therefore more reasonable to at-

tribute any effect of assigned treatment in this sub-population to the causal effect

of treatment,than is the case for the full population. Nevertheless, Hirano et al

(2000) that such causal attributions, remain assumptions, the plausibility of which

is increased by study design strategies such as blinding and double blinding.

Implicit in the above definitions is the Stable-Unit-Value Assumption (SUTVA)

(Rubin, 1986 comment ), which holds that outcomes for any given individual are

not affected by the treatment assignment or outcomes of other individuals. This

seems reasonable in screening trials such as HIP and in many other studies of

interventions delivered at the level of the individual. When SUTVA is violated

the vector of potential outcomes must be expanded to accommodate, not only the

outcomes corresponding to each individual's potential exposure but also combi-

nations of potential exposures for other individuals (Infectious disease reference

here?) Definition of causal effects is then correspondingly more difficult.

Model Structure and Likelihood

4.1

Compliance type's distribution

Noncompliance is the main issue we have to consider in our PPO-survival approach

and it is often described by compliance type variable, which is defined by the vector

of potential received treatment D

= [D

(1), D

(0)] (equation (3.3)). By following

the property of conditional probability, P r(D

, ), the conditional distribution

of vector D

given the assigned treatment Z

, (where is the associated parameter

for this distribution) is broken down in equation (4.1).

P r(D

; ) = P r(D

mis

, D

obs

; )

= P r(D

mis

obs

, Z

;

)P r(D

obs

;

(4.1)

where

and

represent the parameters associated with these two conditional

distributions of D

mis

and D

obs

respectively. This is because D

can also be ex-

pressed as [D

mis

, D

obs

], where D

mis

and D

obs

denote the unobservable and ob-

servable components of potential received treatment vector D

. Note that when

is a binary variable, we can write D

mis

= D

- Z

) and D

obs

= D

On the other hand, by the definition of compliance type U

, given the actual

received treatment D

obs

= D

), learning the value of D

mis

is equivalent to

learning the value of U

. This implies that the conditional probability distribution

of the compliance type U

given D

obs

= D

) and Z

, P r(U

), Z

, ;

), can

be obtained from P r(D

mis

obs

, Z

; ), the conditional probability distribution of

mis

, the unobservable component of the potential received treatment vector D

given the actual received treatment D

obs

and assigned treatment Z

. In other

words:

P r(U

obs

, Z

;

)

P r(D

mis

obs

, Z

;

);

(4.2)

where

indicates distributional equivalence: any probability on one side of

the equation can be obtained from the distribution on the other side. For example,

P r(U

= c

obs

= 1, Z

= 1,

) = P r(D

mis

= 0

obs

= 1, Z

= 1,

After substituting equation (4.2) into the second line in equation (4.1), we

have:

P r(D

, )

P r(U

obs

, Z

;

)P r(D

obs

;

(4.3)

where

denotes the parameter corresponding to the distribution of U

condi-

tional on the actual received treatment D

obs

and assigned treatment Z

, and

is the parameter associated with the distribution of D

obs

given Z

. This implies

that P r(D

, ) can be obtained from the product of P r(U

obs

, Z

;

) and

P r(D

obs

;

). In other words, P r(D

, ), the conditional distribution of D

(the potential received treatment vector) given assigned treatment Z

, can also

be obtained from the product of two conditional distributions, P r(U

obs

, Z

;

)

and P r(D

obs

;

), the former is the conditional distribution of U

given the

actual received treatment D

obs

and assigned treatment Z

, and the latter is the

conditional distribution of D

obs

given Z

(equation (4.3)).

4.2

The structure of the joint distribution

4.2.1

The joint distribution of T, C, D and Z

Suppose a RCT includes N individuals, each component of the potential outcomes

vector, including failure time T

= [T

(1), T

(0)], censoring time C

= [C

(1), C

(0)]

and the potential received treatment D

= [D

(1), D

(0)], can then be written as

a vector of length N, as follows:

T(1) = [T

(1), . . . , T

(1)] , the failure time if assigned to the treatment group;

T(0) = [T

(0), . . . , T

(0)], the failure time if assigned to the control group;

C(1) = [C

(1), . . . , C

(1)], the censoring time if assigned to the treatment group;

C(0) = [C

(0), . . . , C

(0)], the censoring time if assigned to the control group;

D(1) = [D

(1), . . . , D

(1)], the received treatment if assigned to the treatment

group;

D(0) = [D

(0), . . . , D

(0)], the received treatment if assigned to the control

group;

We also assume that all related variables are independent and identically dis-

tributed (i.i.d.) random variables, which include failure time T, censoring time

C, potential received treatment D and assigned treatment Z. Their joint distri-

bution can then be expressed as a product of the joint distribution for N single

observations, that is:

P r(T(1), T(0), C(1), C(0), D(1), D(0), Z; )

P r(T

(1), T

(0), C

(1), C

(0), D

(1), D

(0), Z

; ),

(4.4)

where denotes the model parameter vector.

We suppose that the model parameter can be broken down into three parts,

namely = (

, , ), where is the parameter associated with the joint dis-

tribution of failure and censoring times (outcomes), is the parameter for the

distribution of the vector of potential received treatment D

(post-treatment co-

variates), and is the parameter corresponding to the distribution of assigned

treatment Z

(pre-treatment covariates). According to the rule of conditional dis-

tribution, the joint probability of all the aforementioned variables, corresponding

to the ith individual, P r(T

(1), T

(0), C

(1), C

(0), D

(1), D

(0), Z

; ), can thus

be expressed as a product of three distributions:

· The joint distribution of failure time T

= [T

(1), T

(0)] and censoring time

= [C

(1), C

(0)] given potential received treatment D

= [D

(1), D

(0)]

and assigned treatment Z

, P r(T

(1), T

(0), C

(1), C

(0)

(1), D

(0), Z

;

· The distribution of potential received treatment D

= [D

(1), D

(0)] given

assigned treatment Z

, P r(D

(1), D

(0)

; ).

· The distribution of assigned treatment Z

, P r(Z

, ).

Note that

, and denote the parameters associated with these above three

distributions, respectively. We then have:

P r(T

(1), T

(0), C

(1), C

(0), D

(1), D

(0), Z

; ) =

P r(T

(1), T

(0), C

(1), C

(0)

(1), D

(0), Z

;

)P r(D

(1), D

(0)

; )P r(Z

; ).

(4.5)

Using the notation of potential-outcome vectors, equation (4.5) can thus be ex-

pressed as follows:

P r(T

, D

, Z

; ) =

P r(T

, C

, Z

;

)P r(D

; )P r(Z

; );

(4.6)

for = (

, , ), the model parameter vector defined earlier.

In Section 4.1, we derived equation (4.2), which allows us to replace P r(D

; )

with P r(U

obs

, Z

; )P r(D

obs

;

). The joint distribution of all the aforemen-

tioned variables given in equation (4.4) can now be rewritten as follows:

P r(T

, D

, Z

; )

P r(T

, C

, D

;

)P r(U

, D

obs

; )P r(D

obs

;

)P r(Z

; ),

= P r(T

, C

, D

;

)P r(U

, D

obs

; )P r(D

obs

, Z

; );

(4.7)

where

indicates equivalence in distribution.

The last equality holds in equation (4.7) is because

P r(D

obs

, Z

; ) = P r(D

obs

)P r(Z

; ).

(4.8)

Therefore, these two parameters

and , have been replaced with the parameter

of . The advantage of using to replace

and is that it simplifies notation of

the parameters involved in the model. Recall that the main parameter of interest

in our approach is

, the parameter associated with the joint distribution of failure

time and censoring time (survival outcomes).

In addition, the vector of received treatment D

can be viewed as being

equivalent to compliance type U

. This is because, on one hand, U

= u (for

{c, a, n, d}) is determined by D

= [D

(1) = d

, D

(0) = d

] (for d

, d

{0, 1}) (equation (3.3)), and, on the other hand, if we know the compliance type

= u, the values of vector D

= [D

(1), D

(0)] are then also known. So we write

= u)

= [D

(1), D

(0)] = [d

, d

]) where

denotes equivalence.

Let U

replace D

. We have:

P r(T

, C

, D

, Z

; )

P r(T

, C

, U

, Z

; );

(4.9)

then equation (4.7) is equivalent to

P r(T

, U

, Z

; )

P r(T

, C

, U

;

)P r(U

, D

obs

; )P r(D

obs

, Z

; ).

(4.10)

Furthermore, it is useful to impose some additional structure on the model.

This is achieved by partitioning the parameter vector as

= (

T C

)

where

denotes transpose. We also require that the marginal bivariate failure

time distribution depends only on

and that the marginal bivariate censoring

time distribution depends only on

. Hence the

T C

parameters act as association

parameters. This latter condition is only required to complete the parametrisation

of the joint distribution of the potential failure and censoring times.

By following the assumption of latent ignorability, equation (3.2), which gives

the joint distribution of failure and censoring times conditional on U

and Z

denoted by P r(T

, C

, U

;

), can be separated as a product of two distributions

of the failure time and the censoring time (conditional on compliance type U

only

because of randomisation). Specifically:

P r(T

, C

, Z

;

) = P r(T

, Z

;

)P r(C

, Z

;

(4.11)

This means we can break down

into = (

)

. In other words, under the

latent ignorability assumption, the third part of

, namely

T C

, has been omitted.

Consequently, the model parameter can then be broken down as =

{

, }, where the subcomponent parameters of are detailed below:

corresponds to the parametrisation of the distribution of failure time,

corresponds to the parametrisation of the distribution of censoring time,

corresponds to the parametrisation of the conditional distribution of compliance

type U

, and

corresponds to the parametrisation of the joint distribution of (D

obs

, Z

Equation (4.11) can also be expressed as the following equivalence relation-

ships:

Pr(T

(1), C

(1)

, Z

= 1;

) =Pr(T

(1)

, Z

= 1;

T 1

)Pr(C

(1)

, Z

= 1;

)

Pr(T

(0), C

(0)

, Z

= 0;

) =Pr(T

(0)

, Z

= 0;

T 0

)Pr(C

(0)

, Z

= 0;

(4.12)

We propose a similar structuring of the marginal bivariate failure and cen-

soring time models by assuming

= (

T 1

T 0

T 10

) and

= (

C10

)

respectively. We then have:

Pr(T

(0)

;

) = Pr(T

(0)

;

T 0

)

(4.13)

Pr(T

(1)

;

) = Pr(T

(1)

;

T 1

(4.14)

Pr(C

(0)

;

) = Pr(C

(0)

;

)

(4.15)

Pr(C

(1)

;

) = Pr(C

(1)

;

(4.16)

Note that the third components of

and

, denoted by

T 10

and

C10

, were

not involved. This can be interpreted as that an individual in a RCT can only be

assigned into one of treatment groups (Z

= 1 or 0). No one can be assigned into

both two treatment groups (Z

= 1 and 0). In other words, in practice no such a

situation exists which need parameter

T 10

C10

to describe. So we simplify

and

= (

T 1

T 0

) and

= (

Under randomisation, the assigned treatment Z is independent of all other

variables. This is denoted as Z

(T, C, D, U), where means independent. It

therefore follows that:

P r(T

, C

, U

;

) = P r(T

, C

;

(4.17)

Equation (4.17) represents a major simplification because it means that the

dependency of the potential survival (and censoring) times on assigned treatment

) does not have to be explicitly modelled. In non-randomised studies, this

dependency normally has to be modelled. Without additional assumptions, this

generally leads to problems with identifiability (Rubin, 1978; Graham, 2000).

The joint distribution of all potential entities for the ith observation can then

be expressed as follows:

P r(T

, D

, Z

; )

P r(T

;

)P r(C

|, U

;

)P r(U

obs

;

)pr(D

obs

, Z

; );

(4.18)

where the model parameter is =

{

, }.

Our primary focus is on

. In particular, we wish to focus on (

T 1

T 0

the parameters of the marginal failure time distributions. Given estimates of

these parameters and given our specific distributional assumptions, the estimated

potential survivor functions (under the alternative assigned treatment) and the

estimators of ACE(t) and CACE(t) can then be obtained. Here ACE(t) and

CACE(t) are the forms of average causal effect (ACE) and complier average causal

effect (CACE) (Angrist et al., 1996; Imbens & Rubin, 1997; Yau & Little, 2001;

Peng et al., 2004; Mattei & Mealli, 2007), respectively, for time-to-event studies.

Note that they both are functions of time t.

4.2.2

The joint distribution of the observable variables

For deriving the formulation of the likelihood function, we need to address the

construct of the joint distribution of observed variables first, as it is well known

that the likelihood function of the model parameter , denoted by L

(), is a

function of , for fixed observed variables, and the function is proportional to the

joint distribution of all observed variables.

In time-to-event studies, however, failure time T

and censoring time C

cannot

be fully observed jointly. It is convenient to present a construct of the joint distri-

bution of all observable variables (denoted by P r(T

obs

, C

obs

, D

obs

, Z; )), rather

than the joint distribution of all observed variables, as the former can be ob-

tained directly by integrating P r(T

, C

, D

, Z

; )) with respect to the unobserv-

able/missing variables, which are T

mis

= T

- Z

), C

mis

= C

- Z

) and

mis

= D

- Z

) when Z

is a binary. Therefore:

P r(T

obs

, C

obs

, D

obs

, Z

; )

P r(T

, C

, D

, Z

; )dD

mis

(4.19)

where P r(T

, C

, D

, Z

; ) is given in equation (4.18).

Because of equation (4.18), equation (4.19) can then be expressed as follows

P r(T

obs

, C

obs

, D

obs

, Z

; )

P r(T

;

)P r(C

;

)P r(U

obs

;

)P r(D

obs

, Z

; )dU

mis

(4.20)

where dD

mis

is replaced by dU

because, given D

obs

and Z

, we have D

mis

Recall that D

obs

= D

), this implies that for the ith individual, his/her assigned

treatment Z

and actual received treatment D

obs

are always observed.

Table 3: Compliance types U

= u given observed values of Z

and D

)

Possible U

a or c

n or c

Under the assumptions of exclusion restriction and monotonicity, compliance

type U

has up to three categories, which are compliers (U

= c), never-takers

= n) and always-takers (U

= a) and shown in Table 3. Then integration in

equation (4.20), with respect to U

(compliance type), can be further replaced by

its summation. We then write:

P r(T

obs

, C

obs

, D

obs

, Z

; )

{c,n,a}

P r(T

;

)P r(C

;

)P r(U

obs

;

)P r(D

obs

, Z

; )dC

mis

= P r(D

obs

, Z

; )

{c,n,a}

P r(U

obs

;

)

P r(T

;

)dT

mis

P r(C

;

)dC

mis

= P r(D

obs

, Z

; )

{c,n,a}

P r(T

obs

;

)P r(C

obs

;

)P r(U

obs

;

(4.21)

where

denotes distributional equivalence.

4.3

The likelihood function

As stated in Section 4.2.2, the likelihood function with respect to the model pa-

rameter can be a function that is proportional to the joint distribution of all

observed variables (p9, Tanner (1993)). However, because in time-to-event stud-

ies for the ith individual, either failure time (T

obs

) or censoring time (C

obs

) can

be observed, but not both. This implies that P r(T

obs

, C

obs

, D

obs

, Z

; ) can only

be viewed as the joint distribution for the observable variables (equation (4.21))

rather than the joint distribution for the observed variables.

Hence, instead of the likelihood function, we formulate a function of the model

parameter that is proportional to the joint distribution of all observable vari-

ables, the so-called the idealised likelihood function with respect to the model

parameter denoted by IDL().

4.3.1

The idealised likelihood function

Given the joint distribution of the observable variables, P r(T

obs

, C

obs

, D

obs

, Z

|),

in equation (4.21), for the ith observation, the idealised likelihood function with

respect to the model parameter , denoted by IDL

(), has the following form:

IDL

()

P r(T

obs

, C

obs

, D

obs

, Z

; ).

(4.22)

On the other hand, by following equation (4.21), the product of four distribu-

tions, P r(T

obs

, Z

;

),P r(C

obs

, Z

;

), P r(U

obs

, Z

; ) and P r(D

obs

, Z

; )

can be viewed as a function of the model parameter = (

, ), and

it is proportional to the joint distribution of the idealised observed variables,

P r(T

obs

, C

obs

, D

obs

, Z

; ).

The idealised likelihood function with respect to the model parameter ,

(IDL

()), can then be obtained from the right hand side of equation (4.21),

so it has the formulation as follows:

IDL

()

{c,n,a}

P r(T

obs

, Z

;

)P r(C

obs

, Z

;

)P r(U

obs

, Z

;

)P r(D

obs

, Z

; )

=P r(D

obs

, Z

; )

{c,n,a}

P r(T

obs

, Z

;

)P r(C

obs

, Z

;

)P r(U

obs

, Z

;

(4.23)

where the second equation holds because the term P r(D

obs

, Z

; ) is not related

to the compliance covariate U

involved in the summation (equation (4.23)).

Note that in equation (4.23), assigned treatment Z

is involved in each term of

conditional distribution on the right-hand side. This is true because the observable

variables, namely T

obs

, C

obs

and D

obs

, all rely on the given value of Z

Recall that in our approach, the main parameter of interest is

, the param-

eter associated with the distribution of failure time T

. However, from equation

(4.23), we can see that because the three parameters,

and

, are all

involved in the summation over compliance type U

, the likelihood for these pa-

rameters does not separate into distinct components which can be maximised

separately. This implies that estimation of

cannot proceed independently. So

we write these three parameters as a row vector,

= (

), which are asso-

ciated with the distributions of failure time, censoring time and compliance type,

respectively. We then focus on deriving the idealised likelihood function for

denoted by IDL

(

) rather than IDL

(). Then IDL

(

) can be expressed as

follows:

IDL

(

)

{c,n,a}

P r(T

obs

, Z

;

)P r(C

obs

, Z

;

)P r(U

obs

, Z

;

(4.24)

4.3.2

Likelihood function for complete data

Using R

to denote the censoring indicator, which indicates the situation of ob-

served failure time or censoring time, (1 for the event occurring (observed failure

time), 0 for censoring (observed censoring time)). Therefore, once given the cen-

soring indicator R

for the ith individual, the idealised likelihood function IDL

(

)

can be specified to the likelihood function L

(

), which can be written as the fol-

lowing equation:

(

)

{c,n,a}

pr(T

obs

= t

, Z

;

)

×P r(C

obs

> t

, Z

;

)

×P r(U

obs

, Z

; )

for R

= 1, T

(z) = t,

{c,n,a}

P r(T

obs

> t

, Z

;

)

×pr(C

obs

= t

, Z

;

)

×P r(U

obs

, Z

;

)

for R

= 0, C

(z) = t.

(4.25)

where P r() and pr() denote the probability function and the probability density

function, respectively. However, in practice, the compliance type U

cannot be

observed fully. Hence the parameter

cannot be estimated through the form of

(

) given in equation (4.25). Dempster et al. (1977) proposed an approach for

MLE of incomplete data, named the EM-algorithm, which can be employed in

our case as U

is considered unknown and can be treated as incomplete data.

Briefly, the EM algorithm involves two steps: the E-step and the M-step,

which denote the expectation and maximisation steps, respectively (Dempster

et al., 1977). The E-step aims to obtain the conditional expectation of the log-

likelihood function for complete data log(L

(

)), denoted by Q(,

), given the

observed data T (z), C(z), D(z) and Z = z, and given the current parameter

values

. More details for the EM algorithm is given in Section 6.

In the EM-algorithm, the first step requires the explication of the likelihood

form for complete data denoted by L

(

). This depends on the assumption that

all compliance covariates are known. We then find the related likelihood form

for the ith observation, which can be written as follows (for known compliance

covariates U

= u):

(

) = L

(

, u)

pr(T

obs

= t

, Z

;

)

×P r(C

obs

> t

, Z

;

)

×P r(U

obs

, Z

; )

for R

= 1, T

(z) = t,

P r(T

obs

> t

, Z

;

)

×pr(C

obs

= t

, Z

;

)

×P r(U

obs

, Z

; )

for R

= 0, C

(z) = t.

(4.26)

Compliance type U

= u is used to describe an individual's compliance behaviour.

Therefore, we have redefined the parameter of interest,

T 1

= (

T,u1

) and

T 0

(

T,u0

) (for u

{c, a, n}). Recall that

= (

T 1

T 0

). More specifically, the

parameter of interest associated with the failure time distribution,

, is redefined

= (

T,c1

T,n1

T,a1

T,c0

T,n0

T,a0

). This is equivalent to

= (

T,uz

)

(for u

{a, c, n} and z {0, 1}). Here the subscripts indicate failure time (T )

with U = u for u

{a, c, n} and assigned treatment Z = z for z {0, 1}.

Similarly, we use

C,uz

for u

{a, c, n}, and z {0, 1} to denote the compo-

nents of

, the parameter associated with the censoring time distribution. In

other words, we have

= (

C,c1

C,n1

C,a1

C,c0

C,n0

C,a0

Hence, for the specific assigned treatment Z

= z (with known U

= u), equa-

tion (4.26) can be simplified as follows:

(

, u)

pr(T (z)

= t

, Z

;

T,uz

)

×P r(C(z)

> t

, Z

;

C,uz

)

×P r(U

(z), Z

;

u,zd

)

for R

= 1, T

(z) = t,

P r(T (z)

> t

, Z

;

T,uz

)

×pr(C(z)

= t

, Z

;

C,uz

)

×P r(U

(z), Z

;

u,zd

)

for R

= 0, C

(z) = t.

(4.27)

where R

is the observed indicator of censoring, which is 1 for the event of in-

terest and 0 for censoring. The parameter

u,zd

is associated with the condi-

tional probability of compliance, given Z

= z and D

(z) = d, and is defined by

u,zd

= P r(U

= u

= z, D

(z) = d) for u = (a, c, n) and z, d = 0, 1. Details

about the estimation of the parameter

u,zd

will be discussed in Section 8.

According to equation (4.27), an individual randomised to treatment Z

= 1

who dies or is censored at (observed) time t has a contribution to the likelihood

as follows:

{i:Z

=1}

(

, u)

pr(T (1)

= t

= u, Z

= 1;

T,u1

)

×P r(C(1)

> t

= u, Z

= 1;

C,u1

)

×P r(U

(1), Z

= 1;

u,1d

)

for R

= 1, T (1)

= t,

P r(T (1)

> t

= u, Z

= 1;

T,u1

)

×pr(C(1)

= t

= u, Z

= 1;

C,u1

)

×P r(U

(1), Z

= 1;

u,1d

)

for R

= 0, C(1)

= t.

(4.28)

for d = 0 or 1 and u

{c, a, n} .

Similarly, an individual randomised to treatment Z

= 0 who dies or is censored

at (observed) time t has a contribution to the likelihood for

as follows:

{i:Z

=0}

(

, u)

pr(T (0)

= t

= u, Z

= 0;

T,u0

)

×P r(C(0)

> t

= u, Z

= 0;

C,u0

)

×P r(U

(0), Z

= 0;

u,0d

)

for R

= 1, T (0)

= t,

P r(T (0)

> t

= u, Z

= 0;

T,u0

)

×pr(C(0)

= t

= u, Z

= 0;

C,u1

)

×P r(U

(0), Z

= 0;

u,0d

)

for R

= 0, C(0)

= t.

(4.29)

for d = 0 or 1 and u

{c, n, a} .

Recall from Table 3 that knowing the values of Z

= z and D

(z) = d means

that compliance type U

= u can be determined for up to two types. Hence, given

the value of z and d, equations (4.28) and (4.29) can then be expressed as the

following form:

{i:Z

=z,D

(z)=d}

(

; u)

pr(T (z)

= t

= u, Z

= z;

T,uz

)

×P r(C(z)

> t

= u, Z

= z;

C,uz

)

×P r(U

(z), Z

= z;

u,zd

)

for R

= 1, T (z)

= t,

P r(T (z)

> t

= u, Z

= z;

T,uz

)

×pr(C(z)

= t

= u, Z

= z;

C,uz

)

×P r(U

(z), Z

= z;

u,zd

)

for R

= 0, C(z)

= t.

(4.30)

for z and d = 0 or 1 and u

{c, a, n} . Note that equation (4.30) given above is

a likelihood function of our non-ignorable PPO-survival model which is analo-

gous to the one given in O'Malley & Normand (2005) model that is for normally

distributed outcomes.

For notational convenience, we introduce the following conventions:

(v, t;

T,uv

) = pr(T

(v) = t

= u,

T,uv

)

for v = 0 or 1,

(v, t;

T,uv

) = P r(T

(v) > t

= u,

T,uv

)

for v = 0 or 1,

(v, t;

C,uv

) = pr(C

(v) = t

= u,

C,uv

)

for v = 0 or 1,

(v, t;

C,uv

) = P r(C

(v) > t

= u,

C,uv

) for v = 0 or 1.

(4.31)

As mentioned before, P r() and pr() denote the probability and probability

density function, respectively and v indicates potential treatment levels.

Equation (4.30) can then be rewritten as:

{i:Z

=z,D

(z)=d}

(

; u)

= f

(z, t;

T,uz

(z, t;

C,uz

)

(z, t;

T,uz

(z, t;

C,uz

)

1-R

u,zd

(4.32)

Furthermore, we write:

u,zd

(

T,uz

C,uz

)

= f

(z, t;

T,uz

(z, t;

C,uz

)

(z, t;

T,uz

(z, t;

C,uz

)

1-R

(4.33)

Equation (4.30) can then be expressed more simply as:

{i:Z

=z,D

(z)=d}

(

; u) = F

u,zd

(

T,uz

C,uz

)

u,zd

= l

(

; u)l

(

; u)

(4.34)

Because of the independent and identically distributed (i.i.d.) assumption for

all observable variables (conditional on model parameters), the likelihood function

for complete data denoted by L

() = L(, u) can be expressed by the product

of the fully likelihood function associated with each individual. In other words,

we have:

(

) =

i=1

(

, u) =

i=1

u,zd

(

T,uz

C,uz

)

u,zd

i=1

(

; u)l

(

; u). (4.35)

Given observed Z

= z and D

(z) = d, the likelihood form of for complete

data, in which compliance type U

(for each individual) is presumed to be known,

can then be expressed as:

(

) = L(, u) =

(

, u)

{i:U

=c,Z

=1,D

(1)=1}

c,11

(

T,c1

C,c1

)

c,11

{i:U

=a,Z

=1,D

(1)=1}

a,11

(

T,a1

C,a

)

a,11

{i:U

=c,Z

=0,D(0)=0}

c,00

(

T,c0

C,c0

)

c,00

{i:U

=n,Z

=0,D

(0)=0}

n,00

(

T,n0

C,n0

)

n,00

{i:U

=n,Z

=1,D(0)=0}

n,10

(

T,n0

C,n0

)

n,10

{i:U

=a,Z

=0,D(0)=1}

a,01

(

T,a1

C,a1

)

a,01

(4.36)

4.3.3

The likelihood form for incomplete data

Since compliance type U

is unobservable, U

needs to be treated as a variable

with missing data. Hence, data associated with the case where noncompliance

occurs have to be viewed as incomplete data.

Let

A(z, d) denote the subset of subjects with specified values of Z and D;

more specifically,

A(z, d) = {i : Z

= z, and D

= d

}. The likelihood form for

incomplete data for the non-ignorable PPO-survival models can then be written

as shown below:

) =

{c,n,a}

, u) =

{c,n,a} i

(

, u)

A(1,1)

c,11

(

T,c1

C,c1

)

c,11

+ F

a,11

(

T,a1

C,a1

)

a,11

A(0,0)

c,00

(

T,c0

C,c0

)

c,00

+ F

n,00

(

T,n0

C,n0

)

n,00

A(1,0)

n,10

(

T,n0

C,n0

)

n,10

A(0,1)

a,01

(

T,a1

C,a1

)

a,01

(4.37)

The likelihood function of

for incomplete data, given in equation (4.37),

does not appear to be readily broken down into products of functions involving

only

and

, respectively. That is L(

) = L(

)L(

). That is the case where

censoring is not ignorable (Baker, 1997).

Simplification of model parameter

In equation (4.36) (on page 36), we observe that the parameter

has the form

= (

T,uz

C,uz

u,zd

) for u = (a, c, n) and z, d = (0, 1). However, the number of

model parameters can be further reduced by following the assumption of compound

exclusion restriction (Angrist et al., 1996; Frangakis & Rubin, 1999).

The compound exclusion restriction assumes that all potential-outcome vectors

can be affected by the received treatment D

(Z), rather than by the assigned

treatment Z

. This implies that the outcome of interest for an individual, as

long as his/her received treatment has the same value, will have the same value

when he/she is assigned into either treatment group (if possible in practice). For

convenience to express the assumption of exclusion restriction mathematically, we

use the notation T

= z, D

(z) = d) and C

= z, D

(z) = d) to denote the ith

individual's failure and censoring time where his/her assigned treatment Z

and

actual received treatment D

) are given by Z

= z and D

(z) = d, respectively.

Recall that in a two-arm RCT, we have z and d

{1, 0}. Hence the exclusion

restriction assumption can be expressed as follows:

= 1, D

(1) = d) = T

= 0, D

(0) = d)

for failure time,

= 1, D

(1) = d) = C

= 0, D

(0) = d)

for censoring time.

(5.1)

This also implies that individuals who are never-takers or always-takers would

have the same failure and censoring times probability distributions, whether they

are in the treated group or in the control group. In other words:

P r(T

, D

))

= 0, U

= u)

= P r(T

, D

))

= 1, U

= u)

for u = a or n;

P r(C

, D

))

= 0, U

= u)

= P r(C

, D

))

= 1, U

= u)

for u = a or n.

(5.2)

For convenience, in the remainder of this thesis, we use T

) and C

) to

denote the failure and censoring times corresponding to the assigned treatment for

the ith individual, whose assigned treatment is Z

= z and whose actual received

treatment is D

(z) = d. Equation (5.2) can then be re-written as:

P r(T

(0)

= 0, U

= u)

= P r(T

(1)

= 1, U

= u)

for u = a or n;

P r(C

(0)

= 0, U

= u)

= P r(C

(1)

= 1, U

= u)

for u = a or n.

(5.3)

Consequently, the parameters associated with the outcome distributions (fail-

ure and censoring), which correspond to the subgroup with U

= n and Z

= 1 or

0, and to subgroup with U

= a and Z

= 1 or 0, are equivalent. Hence, we have

the following relationships:

T,n1

T,n0

T,a1

T,a0

C,n1

C,n0

C,a1

T,a0

(5.4)

Therefore we can use

T,n

to represent both

T,n1

and

T,n0

, as they are identical

under the assumption of exclusion restriction. Similarly, the two parameters

T,a1

and

T,a0

are identical under the same assumption. So we can use

T,a

to denote

both

T,a1

and

T,a0

. By following the same logic, we write

C,n

C,n1

C,n0

and

C,a

C,a1

C,a0

Equation (5.4) implies that the parameters

T,uz

and

C,uz

, which are associated

with the failure and censoring times for the subgroup with U

= u and assigned

treatment Z

= z, include the following components:

T,uz

= (

T,c1

T,c0

T,n

T,a

)

C,uz

= (

C,c1

C,c0

C,n

C,a

(5.5)

Note that no parameter for the subgroup with U

= d and Z

= 1 or 0 exists,

because under the assumption of Monotonicity, no defiers exist.

Thereby, because we have assumed monotonicity, up to two categories exist

for compliance type U

= u (knowing the value of Z

and D

)) (Table 3).

The parameter

u,zd

= P r(U

= u

= z, D

(z) = d), which is defined as the

conditional probability of compliance type U

given the assigned treatment Z

= z

and actual received treatment D

obs

= d, can now satisfy the following equations:

c,11

a,11

= 1 for u

{c, a} when Z

= 1 and D

(1) = 1,

c,00

n,00

= 1 for u

{c, n} when Z

= 0 and D

(0) = 0,

n,10

= 1

for u

{n}

when Z

= 1 and D

(1) = 0,

a,01

= 1,

for u

{a}

when Z

= 0 and D

(0) = 1.

(5.6)

The other

u,zd

for u

{c, n, a} and z, d = 1 or 0 have zero values.

Now the model parameters of interest

can then be specified as = (, ),

where

= (

T,c1

T,c0

T,n

T,a

C,c1

C,c0

C,n

C,a

= (

c,11

c,00

(5.7)

In addition, using

to denote the probability of compliance type U

= u, i.e.

= P r(U

= u), and by following the theorem of total probability, we have:

P r(U

= u) =

{1,0} z{1,0}

P r(U

= u, D

(z) = d, Z

= z)

{1,0} v{1,0}

P r(U

= u

(z) = d, Z

= z)P r(D

(z) = d, Z

= z).

(5.8)

Recall that

u,zd

= P r(U

= u

(z) = d, Z

= z). Denote

= P r(D

(z) =

d, Z

= z) to correspond to the probability of individuals with values (Z

z, D

(z) = d)) in the population. Then equation (5.8) can be re-written as:

{1,0} z{1,0}

u,zd

(5.9)

By following this definition,

can then be estimated by

. Here

is the size of the subgroup in a trial where all individuals have Z

= z and

(z) = d, and N is the total number of individuals involved in the study trial.

Furthermore, we have P r(U

= u

= 1) = P r(U

= u

= 0) as randomisation

guarantees a balance between the treated (Z

= 1) and the control (Z

= 0)

groups. Therefore we can write

= P r(U

= u

= 1) = P r(U

= u

= 0).

Since the number of subjects who do or do not actually receive the treatment

in the treatment group (Z

= 1) are observable (denoted by N

and N

respec-

tively), we can then estimate

. If always-takers exist in a trial,

then the number of subjects N

in the control group who receive the treatment

is known. Then we have

. Also

= 1 because no defiers exist.

Then the probability of compliers can be estimated by

= 1

(O'Malley

& Normand, 2005). These can be expressed as:

;

= 1

(5.10)

Recall that individuals from the subgroup with Z

= 1 and D

(1) = 1 can

be written as i

A(1, 1). Their compliance type can be compliers, U

= c, or

always-takers, U

= a. Similarly, individuals from the subgroup with Z

= 0

and D

(0) = 0, denoted by i

A(0, 0), may either be compliers, U

= c, or

never-takers, U

= n. Then, following the definition of

u,zd

and by Table 3, we

have:

c,11

c,00

(5.11)

By following equation (4.35) and considering the relationships given in equa-

tion (5.11), the likelihood function L

(

) with respect to , can be replaced by

a product of l

(

) and l

(

). That is:

(

) = l

(

(5.12)

where

(

) =

i=1

u,zd

(

T,uz

C,uz

(

) =

i=1

u,zd

(5.13)

Therefore, to maximise L

(

) is equivalent to maximise l

(

) and l

(

), sepa-

rately.

Because of the relationships given equation (5.11), which implies that

c,11

and

c,00

can be calculated directly through the value of

for u

{c, n, a}, we then

write:

= (

(5.14)

Therefore, to estimate parameter

= (, ) is equivalent to estimating (,

As we stated before, the MLE of

can be estimated by equation (5.10). Hence

we need only to maximise l

(

) for obtaining the MLE of . The EM algorithm is

employed here because an unobservable variable, compliance type U

, is involved

in the likelihood function l

(

In other words, there is only the first part of parameter

, denoted by need

to be further estimated through the EM algorithm. That is:

= (

T,uz

C,uz

);

= (

T,c1

T,c0

T,n

T,a

C,c1

C,c0

C,n

C,a

(5.15)

The rest part of

, the parameter

can be regarded as a known parameter

when we estimate the parameter

. As a consequence, the parameter is also

known because of equation (5.11). For simplification of notation, in the remains

of this chapter we still use

rather than

to refer to the rest part of

EM algorithm

The EM algorithm is a generally applicable algorithm that provides an iterative

procedure for computing MLEs in such situations. Indeed, the EM algorithm is

a commonly used method of maximum likelihood to deal with incomplete data

(Laird & Louis, 1982; Baker, 1992; Meng & Rubin, 1991, 1993; Imbens & Rubin,

1997; O'Malley & Normand, 2005; Peng et al., 2004).

In general, the likelihood method requires the score equation to be solved.

Unfortunately, the likelihood function of

for incomplete data, as given in equa-

tion (4.37), cannot be formulated. This is because, at least for those individuals

assigned to the control group, their compliance type, U

, will never be observed.

The EM algorithm approaches this problem (of solving the incomplete data likeli-

hood score function) indirectly by proceeding iteratively in terms of the complete

data log likelihood function of

, denoted by log(L

(

)) (equation (5.12)).

6.1

The expectation step

According to Dempster et al. (1977), as introduced in Section 4.3.2, the first

step in the EM algorithm is to obtain the conditional expectation of log(L

(

)),

the Q-function Q(

). Now it is replaced by the conditional expectation of

log(l

(

)), the Q-function Q(,

). In other words:

) = E

{log(l

(

)|(T

, C

)

obs

, Z

= z, D

(z) = d), U

= u, ;

(6.1)

where u

{c, n, a} and l

(

) denotes the likelihood function with respect to

parameter

for the complete data that is given in equation (5.13). (T

, C

)

obs

indicates the observable part of survival outcomes (T

, C

), which is (T

obs

t, C

obs

> t) if the ith individual experienced the event at time t; or (T

obs

t, C

obs

= t) if the ith individual is censored at time t.

Note that equations (5.11) and (5.10) have shown that

is only determined by

the observed values of subgroup sizes, which are denoted by N

, N

, and N

This implies that for a certain dataset, L

(

) would be a constant because in the

dataset N

, N

, and N

are fixed. Hence, to maximise l

(

) is equivalent

to maximise L

(

|) = l

(

) = Kl

(

) when is known. (K refers to

any constant here). So we give the likelihood function with respect to parameter

as l

(

) = L

(

|), which has the same structure as L

(

), the likelihood

function of

given in equation (4.36), but is viewed as a known vector. The

advantage of using this form of the likelihood function l

(

) = L

(

|) will be

seen in Chapter 7 when we create an extended PPO-survival model for a more

complicated situation in time-to-event study.

As compliance type, U

, can be treated as a discrete random variable with

c, a, n (three states), then, by the definition of expectation of discrete random

variables, equation (6.1) can be expressed as follows:

(

)

{c,n,a}

P r(U

= u

|(T

, C

)

obs

, Z

= z, D

(z) = d;

)

{log(l(, u)}

(6.2)

where l(

, u) is another form of l

(

), the likelihood function with respect to

for complete data, we give its form as follows:

(

) = L(, u|),

(6.3)

because L

(

) = L(, u).

In addition, we can further assume that compliance type, U

, follows the

Bernoulli distribution, given observed data T

(z), C

(z), D

(z).

This is true

because of the assumptions of compound exclusion restriction and monotonicity,

given (Z

= z, D

(z) = d). Recall that compliance type has up to two compli-

ance categories for this development (refer to Table 3). More details about the

structure of the Q-function Q(

), the expectation function, are given Section

8.2.

6.2

The maximisation step

After obtaining the formulation of Q-function Q(

) given the current param-

eter values

, the second step of the EM algorithm is to find a new estimate,

k+1

, which maximizes the Q-function Q(

). This implies that the following

inequality is always satisfied:

k+1

)

Q(,

)

for all

(6.4)

where

is the parameter space of

Further information on the process of maximisation for the non-ignorable

PPO-survival model is given in Section 8.3.

Model specifications

For the purpose of showing how our non-ignorable models work, we focus mainly

on the Weibull model as the survival distribution. This is due to its special fea-

tures. Firstly, the Weibull distribution is flexible for many different patterns of

data. This can be seen from its varying types of hazard curve (Klein, 1997; Cox

& Oakes, 1984). Secondly, the Weibull distribution is associated with relatively

simple survival, hazards and probability density functions (p.19, Cox & Oakes

(1984)). This makes the Weibull a very popular parametric model in conventional

survival analysis. Furthermore, the Weibull distribution is the only distribution

which satisfies both the accelerated failure time (AFT) and the Cox proportional

hazard (PH) models (Cox & Oakes, 1984). The log-normal model is another plau-

sible parametric distribution used in conventional survival analysis. We also spec-

ify the log-normal distribution for the non-ignorable PPO-survival model (Section

7.1). In this thesis, our parametric potential-outcome (PPO) models are specified

in terms of the Weibull and the log-normal distributions. Other distributions are

the topic of future research.

7.1

Specific non-ignorable PPO-survival models

When accounting for a censoring time distribution, the non-ignorable survival

model can be specified in more than two forms. This is true, even when the

Weibull and the log-normal are the only pre-specified distributions. Note that

failure and censoring times may follow different types of distribution. For instance,

suppose that the failure time follows a Weibull distribution and its associated

censoring time has an underlying log-normal distribution. In this case, the non-

ignorable survival model has to be specified as a NIGN-WL model; namely a

non-ignorable survival Weibull log-normal model.

However, when failure and

censoring times follow the log-normal and the Weibull distribution, respectively,

we have the NIGN-LW model. In this thesis, we focus, however, on specific non-

ignorable survival models with the same distributions for both the failure and

censoring times. These are clearly denoted as the NIGN-WW model and the

NIGN-LL model. Table 4 gives the nomenclature for the non-ignorable survival

model specifications.

Table 4: Specification of the non-ignorable PPO-survival models.

Distribution

non-ignorable survival model

Weibull

NIGN-WW model

log-normal

NIGN-LL model

7.2

Parameter of interest

in the specific models

The parameter

= (

T,uz

C,uz

) is associated with failure and censoring time

distributions, respectively, corresponding to four subgroups each. In total,

in-

cludes 8

× 2 = 16 components. This is true, as both a Weibull and a log-normal

distribution include two model parameters: the scale parameter and the shape

parameter for the Weibull; the mean and the variance of the logarithms of

survival outcome for the log-normal.

7.2.1

Parameter

in the NIGN-WW model

Suppose failure time T

, corresponding to the baseline, follows a Weibull distribu-

tion with parameter

and

. By following Theorem 1 (proved in Appendix 10),

the parameter of most interest,

T,uz

, which is associated with the distribu-

tions of failure time and is specified in equation (5.15) (the first four parameters

corresponding to 4 possible subgroups), has the following relationships with the

parameters associated with the standard distribution for the baseline failure time,

and

, and the regression coefficients in AFT model, which are:

T,c1

= (

T,c1

);

where

T,c1

exp(

(c)

) and

T,c1

T,c0

= (

T,c0

);

where

T,c0

exp(

(c)

) and

T,c0

T,n

= (

T,n

);

where

T,n

and

T,n

T,a

= (

T,a

);

where

T,a

exp(

(a)

) and

T,c0

(7.1)

Similarly, we suppose that the censoring time C

, corresponding to the base-

line, also follows a Weibull distribution with parameters

and

. In this case,

C,uz

) can be specified as follows:

C,c1

= (

C,c1

);

where

C,c1

exp(

(c)

) and

C,c1

C,c0

= (

C,c0

);

where

C,c0

exp(

(c)

) and

C,c0

C,n

= (

C,n

);

where

C,n

and

C,n

C,a

= (

c,a

C,a

);

where

C,a

exp(

(a)

) and

C,c0

(7.2)

Consequently, the parameter

given in equations (5.15) for the NIGN-WW

model can be written as follows:

= (

T,uz

C,uz

= (

T,c1

T,c0

T,n

T,a

C,c1

C,c0

C,n

C,a

(7.3)

Here

and

denote the shape parameters of the Weibull models

associated with the failure and censoring times (which correspond to the baseline),

respectively. The number of components involved in is now reduced from 16

to 10. Note that equations (7.1) and (7.2) show that these scale parameters

(corresponding to the four subgroups), namely

T,c1

T,c0

T,n

T,a

, have been

linked to

( the scale parameter in the Weibull distribution, associated with

the failure time for the baseline), through

(c)

(a)

, the regression coefficients

in the AFT model. However, we use the scale parameters associated with the

four subgroups, rather than the AFT regression coefficients because at this stage,

the former clearly describe model parameters in which each subgroup (either for

failure time or censoring time) has its own scale parameter and these subgroups

share the same shape parameter. Recall that these four subgroups are as follows:

Subgroup 1 and Subgroup 2 consist of compliers who are all assigned to the treated

group (U

= c and Z

= 1) or the control group (U

= c and Z

= 0), respectively;

Subgroup 3 includes never-takers (U

= a) only; and Subgroup 4 includes always-

takers (U

= a).

7.2.2

Parameter

in the NIGN-LL model

As in the NIGN-WW model, the parameter

given in equation (5.15) for the

NIGN-LL model can then be specified as follows:

= (

T,uz

C,uz

= (

T,c1

T,c0

T,n

T,a

C,c1

C,c0

C,n

C,a

(7.4)

The parameters

and

, given in equations (7.4), denote the variance pa-

rameters of the logarithm of the failure time log(T

) (

(2)

) and the logarithm

of the censoring times log(C

) (

(2)

) associated with the baseline

and

By following Theorem 1,

given in Appendix 10, the parameters associated with the failure time distribution

satisfy the following equations:

T,c1

= (

T,c1

)

where

T,c1

(c)

and

T,c1

T,c0

= (

T,c0

)

where

T,c0

(c)

and

T,c0

T,n

= (

T,n

)

where

T,n

and

T,n

T,a

= (

T,a

)

where

T,a

(a)

and

T,c0

(7.5)

Parameters associated with the censoring time distribution have the following

relationships:

C,c1

= (

C,c1

)

where

C,c1

(c)

and

C,c1

C,c0

= (

C,c0

)

where

C,c0

(c)

and

C,c0

C,n

= (

v,n

C,n

)

where

C,n

and

C,n

C,a

= (

C,a

)

where

C,a

(a)

and

C,c0

(7.6)

Table 5 shows the parameters involved in the NIGN-WW and the NIGN-LL

models.

Table 5: Non-ignorable PPO-survival model parameter vectors.

Model name

model parameter vector

NIGN-WW

T,c1

T,c0

T,n

T,a

C,c1

C,c0

C,n

C,a

NIGN-LL

T,c1

T,c0

T,n

T,a

C,c1

C,c0

C,n

C,a

Computational issues

Since the compliance type variable, U

= u, is unobservable, those general MLE

methods are not easily implemented for the PPO-survival model. Alternatively,

as mentioned before (Section 6), the EM algorithm (Dempster et al., 1977) can be

employed for our developments. Since the posterior probabilities of compliance

type U

, P r(U

= u

|(T

, C

)

obs

, Z

= z, D

(z) = d;

), have been involved in the

Q-function, Q(

), (equation (6.2)), we need to formulate it first before giving

the specified Q-function Q(

8.1

Maximization of the likelihood function using the EM

algorithm

8.1.1

Posterior probabilities of compliance type U

For convenience of expression, we use P

, P

and P

to denote the related

posterior probabilities for non-censored R

= 1 associated with the following cases:

(1) Z

= 1, D

(1) = 1; (2) Z

= 0, D

(0) = 0; (3) Z

= 1, D

(1) = 0; and (4)

= 0, D

(0) = 1. Note that in the first two cases given above, compliance type

can possible be two categories, say compliers (U

= c) or always-takers (U

= a)

for case (1) and compliers (U

= c) or never-takers (U

= n) for case (2); while in

case (3), compliance type U

can only be never-takers (U

= n); and in case (4),

compliance type U

can only be always-takers (U

= a) (Table 3). This is because

of the monotonicity assumption, which excludes defiers.

So we define P

, P

and P

as follows:

= P r(U

= c

|(T

(1), C

(1))

obs

, Z

= 1, D

(1) = 1;

)

for R

= 1 ,

= P r(U

= c

|(T

(0), C

(0))

obs

, Z

= 0, D

(0) = 0;

)

for R

= 1 ,

= P r(U

= n

|(T

(1), C

(1))

obs

, Z

= 1, D

(1) = 0;

)

for R

= 1 ,

= P r(U

= a

|(T

(0), C

(0))

obs

, Z

= 0, D

(0) = 1;

)

for R

= 1 .

(8.1)

Similarly, we use Q

, Q

and Q

to denote the related posterior probabilities

for censored R

= 0 associated with the aforementioned four cases, they are defined

as the following:

= P r(U

= c

|(T

(1), C

(1))

obs

, Z

= 1, D

(1) = 1;

)

for R

= 0 ,

= P r(U

= c

|(T

(0), C

(0))

obs

, Z

= 0, D

(0) = 0;

)

for R

= 0,

= P r(U

= n

|(T

(1), C

(1))

obs

, Z

= 1, D

(1) = 0;

)

for R

= 0,

= P r(U

= a

|(T

(0), C

(0))

obs

, Z

= 0, D

(0) = 1;

)

for R

= 0 .

(8.2)

Note that when the event occurs (non-censored and denoted by R

= 1), the

observed data includes: failure time T

(z), censoring time (C

(z) > T

(z)), assigned

treatment Z

= z and actual received treatment D

(z), and when an individual is

censored (denoted by R

= 0), the observed data includes: censoring time C

(z),

censored failure time T

(z) > C

(z), assigned treatment Z

and actual received

treatment D

(z).

The formulations for the posterior probabilities corresponding to the last two

cases, namely Z

= 1, D

(1) = 0 ; and Z

= 0, D

(0) = 1, are as follows:

P r(U

= n

|(T

(1), C

(1))

obs

, Z

= 1, D

= 0;

) = 1;

for R

= 1;

P r(U

= a

|(T

(0), C

(0))

obs

, Z

= 0, D

= 1;

) = 1;

for R

= 1;

P r(U

= n

|(T

(1), C

(1))

obs

, Z

= 1, D

= 0;

) = 1;

for R

= 0;

P r(U

= a

|(T

(0), C

(0))

obs

, Z

= 0, D

= 1;

) = 1;

for R

= 0.

(8.3)

This is true, because no defiers exist and by the definition of compliance type

, we know that an individual is a never-taker if he/she is assigned to the treat-

ment group Z

= 1 but does not receive the treatment D

(1) = 0 or is an always-

taker if the individual is assigned to the control group Z

= 0 but for some reason

does receive the treatment.

Equation (8.3) gives the following equivalences:

= 1

for i

C(1, 0, 1),

= 1

for i

C(1, 0, 0),

= 1

for i

C(0, 1, 1),

= 1

for i

C(0, 1, 0).

(8.4)

where

C(z, d, r) denotes the subset of individuals, which is defined by the following:

C(z, d, r) = {i : Z

= z, D

= d, R

= r

} for all z, d, r {1, 0}. Now, we

formulate the posterior probability of compliance type U

for the first two cases,

i.e. Z

= 1 and D

(1) = 1, and Z

= 0 and D

(0) = 0. Since both failure and

censoring times are involved in the non-ignorable PPO-survival model, so too

are their posterior probabilities. As per Bayes' theorem, these can be written as

the formulae given in equations (8.5) and (8.6) (see next page). For the non-

ignorable PPO-survival model, P

, P

, Q

and Q

can then be written as the

equivalences shown in equation (8.7) on page 55.

(1)

(1))

obs

(1)

;

)

;

)

;

)

;

)

,if

(1)

(1))

obs

(1)

;

)

;

)

;

)

;

)

,if

(1)

(8.5)

for

{

;

and

(0)

(0))

obs

(0)

;

)

;

)

;

)

;

)

(0)

(0))

obs

(0)

;

)

;

)

;

)

;

)

(0)

(8.6)

for

{

;

)

;

)

;

)

;

for

,1)

;

)

;

)

;

)

;

for

,1)

;

)

;

)

;

)

;

for

,0)

;

)

;

)

;

)

;

for

,0)

(8.7)

For convenience, we use W

(u,

) to denote the posterior probability of com-

pliance type P r(U

= u

|(T

, C

)

obs

, Z

= z, D

(z) = d,

) given the observed data

(and parameter values) for the non-ignorable PPO-survival model. That is:

(u,

) = P r(U

= u

|(T

, C

)

obs

, Z

= z, D

(z) = d;

(8.8)

where (T

, C

)

obs

is same as given in equation (6.1), and indicates the observable

part of survival outcomes (T

, C

). (T

, C

)

obs

is equivalent to T

obs

= T

(z) given

= z, when R

= 1; (T

, C

)

obs

is equivalent to C

obs

= C

(z) given Z

= z, when

= 0; and

represents the given value of the relevant model parameters, the

components of which are:

(

T,c1

T,c0

T,n

T,a

, ,

C,c1

C,c0

C,n

C,a

)

(8.9)

for the NIGN-WW model and

(

T,c1

T,c0

T,n

T,a

, ,

C,c1

C,c0

C,n

C,a

)

(8.10)

for the NIGN-LL model.

The posterior probability W

(u,

) of the ith individual can then be written

as follows:

(u,

) =

)

=c}

- P

);

=a}

for i

C(1, 1, 1),

)

=c}

- P

);

=n}

for i

C(0, 0, 1),

)

=c}

- Q

);

=a}

for i

C(1, 1, 0),

)

=c}

- Q

);

=n}

for i

C(0, 0, 0),

= 1;

for i

C(1, 0, 1),

= 1;

for i

C(1, 0, 0),

= 1;

for i

C(0, 1, 1),

= 1;

for i

C(0, 1, 0);

(8.11)

where I

{} is an indicator function, and P

, P

, Q

and Q

denote the pos-

terior probabilities for the specific cases, given by equation (8.7) for the non-

ignorable PPO-survival model. Recall that

C(z, d, r) denotes the subset of indi-

viduals

C(z, d, r) = {i : Z

= z, D

= d, R

= r

} for all z, d, r {1, 0}.

8.2

Structures of the expectation function Q(

)

After defining the posterior probability, W

(u,

), which is given in equation

(8.11), the expectation equation, Q-function Q(

) specified in equation (6.2)

can then be re-written as:

(

) =

{c,n,a}

(u,

)

{log(l

(

)}

{c,n,a}

(u,

)

{log(l(, u)},

(8.12)

where l

(

) is given in equation (4.36).

After substituting equations (8.7), (8.11) and (4.36) into equation (8.12),

) can be specified as in equation (8.13) overleaf.

(

,1)

{

;

)

;

)

)(

)}

,0)

{

;

)

;

)

)(

)}

,1)

{

;

)

;

)

)(

)}

,0)

{

;

)

;

)

)(

)}

,1)

;

)

,0)

;

)

,1)

;

)

,0)

;

)

(8.13)

8.3

Maximization of the expectation function Q(

)

As stated in Section 6, the M-step aims to maximise the expectation function

given in equation (8.13). To realise this goal, the score equation (McLachlan &

Krishnan, 1997; Casella & Berger, 2002) is often used in the M-step. Specifically,

for the ith individual, we take the first partial derivative of Q

), with respect

to the parameter in equation (8.13) for the non-ignorable PPO model. Then

the i

score equation is:

obs

(

) =

(

) =

{c,n,a}

(u,

)

logl

(

, u).

(8.14)

As before, let S

(u,

) denote

logl

(, u). Then the score equation for the

individual can be expressed as:

obs

(

) =

(u,

(8.15)

All parameters can then be estimated simultaneously by solving:

obs

(

) = 0.

(8.16)

Let

(k+1)

denote the solution of equation (8.16). We then substitute

(k)

) with

(k+1)

and repeat the M-step until convergence.

The models proposed in this thesis use a built-in function in MATLAB (7.4.0,

R2007a) (Higham & Higham, 2000). This function, fminsearch, is based on

the Nelder-Mead method (Lagarias et al., 1998) for multidimensional uncon-

strained minimisation.

We use it to minimise the negative expectation func-

tion

-Q(,

(k)

), which is equivalent to maximising the expectation function

(k)

8.4

Estimation of the standard errors

8.4.1

The standard error for the estimated model parameter

The EM algorithm relies only on the convergence of iterative procedures, in which

no gradient matrices are derived. As such, the EM algorithm does not provide the

information matrix required for traditional calculation of the asymptotic variance-

covariance matrix of the MLEs, from either the expected information matrix or

the observed information matrix (OIM). Consequently, no standard errors of the

MLEs can be obtained, even after possible adaptation of the EM algorithm Louis

(1982), however, proposed an approach for calculating the observed information

matrix (OIM) when the EM algorithm is used to find MLEs for so-called incom-

plete data problems. Louis's approach (Louis, 1982) only requires computation

of the complete-data gradient vector or computation of the second derivative ma-

trix, (which is not associated with the incomplete-data likelihood function). After

obtaining the OIM, the covariance matrix can then be calculated easily. Only

the inverse of the OIM is then required. The standard error of the MLEs vectors

can thus be obtained easily, in that the standard error is the square root of the

diagonals of the covariance matrix, where the latter is the inverse of the observed

information matrix (OIM).

Louis's method

Let Data

f ull

and Data

obs

denote the complete (full) and incomplete (observed)

data, respectively. The former includes the observed time (T, C)

obs

, (which is

equivalent to (Y (z), R(z)) in survival analysis, where Y (z) = min(T (z), C(z))

and R(z) is the censoring indicator), assigned treatment Z = z, received treat-

Equation (3.2 ) in Louis (1982) is strictly not correct. We believe that the rule of a non-

exchangeable vector order in the product was ignored. Details of our correction is given in

Appendix 10.

ment D(z) and the compliance type U . Thereby Data

f ull

= ((T, C)

obs

, Z =

z, D(z), U ). As compliance type U is unobservable, we have Data

obs

= ((T, C)

obs

, Z =

z, D(z)). By following equation (3.3) in Louis (1982), the observed information

matrix (OIM), I

obs

= I(^

), which can be viewed as the matrix at the MLEs,

where matrix I can be expressed in two parts: the conditional expected full data

OIM, I

f ull

, and the expected information for the conditional distribution of the

complete data I

f ull

|obs

. This is represented as follows:

obs

= I(^

) = I

f ull

) - I

f ull

|obs

(8.17)

Under the independent and identically distributed (i.i.d) assumption, these

two matrices I

f ull

and I

f ull

|obs

can be specified as:

f ull

= E

i=1

logL

(

, u)|(T

, C

)

obs

, Z

= z, D

(z) = d, U

= u

} (8.18)

f ull

|obs

logl

(

)

logl

(

) |(T

, C

)

obs

, Z

= z, D

(z) = d, U

= u

i=j

logl

(

)|(T

, C

)

obs

, Z

= z, D

(z) = d, U

= u

logl

(

)|(T

, C

)

obs

, Z

= z, D

(z) = d, U

= u

(8.19)

where

denotes transpose. Consequently, the observed information matrix I

obs

can be expressed

, after correcting the classical method of Louis (1982) as:

obs

= I(^

)

i=1

logl

(

, u)|(T

, C

)

obs

, Z

= z, D

(z) = d

|=^

logl

(

, u)

logl

(, u)

|(T

, C

)

obs

, Z

= z, D

(z) = d

|=^

i=j

logl

(, u)

|(T

, C

)

obs

, Z

= z, D

(z) = d

logl

(, u)

|(T

, C

)

obs

, Z

= z, D

(z) = d

|=^

(8.20)

Equation (8.20) (also see equation (D-28) in Appendix10) guarantees that the OIM

is symmetric. (Details are given on page 63.) As the inverse of I

obs

is the asymptotic

variance-covariance matrix of the MLEs, ^

, this can be denoted as V(^) = I

-1

). Its

diagonal vector is the variance vector associated with each of the components of ^

and

is denoted by SE

. The standard error of ^

can then be calculated directly by taking

the square root of

, where

{V(^)}

(i = 1, 2,

· · · , p), for p parameters.

8.4.2

The observed information matrix (OIM)

Note that the first term in equation (8.17) is equivalent to the second order partial

derivative of Q-function Q(

), which can be written as

(k)

)

and

is a interest

parameter vector with length p. If we write

= (

, . . . ,

), the element at the ith

row and the jth column in the matrix

(k)

)

, the second order derivative matrix

of Q-function Q(

(k)

) with respect to the ith and jth components of the interest

parameter

and

, can be written as follows:

(k)

)

(k)

)

for i and j from 1 to p.

(8.21)

Since in the PPO-survival model, the interest parameter vector

, given in equation

This equation can be regarded as our correction of equation (3.2') of Louis (1982). Louis's

subscripts should specify that i = j rather than i < j.

(5.15), consists of the parameters associated with the distributions of failure and cen-

soring times for three subgroups. This leads to the fact of that majority of elements in

the matrix

(k)

)

are zeros. For instance, the second order partial derivative of the

Q-function,Q(

(k)

)

with respect to two scale parameters

T,c1

and

T,c0

(in Weibull PPO-survival model) is zero. That is:

(k)

)

(k)

)

= 0

(8.22)

As no an explicit relationship exists between

and

, which represent the scale pa-

rameter from two different subgroups here.

Even though in the same subgroup, however, with respect to the parameters asso-

ciated with the distributions of failure and censoring times,

(k)

)

can also be zero.

For example, let i = 1 and j = 5, in Weibull PPO-survival model we have

T,c1

and

C,c1

, where

(k)

)

= 0 always holds. This is because there is not an explicit

relationship between these two parameters

T,c1

and

C,c1

which are from two different

distributions. Recall that

T,c1

denotes the scale parameter of Weibull for failure time

in and

C,c1

denotes the scale parameter of Weibull for censoring time in subgroup1).

Therefore, in the PPO-survival model (including both Weibull and Log-normal

form), the first term of equation (8.20) can be simplified directly without knowing

the estimated value of parameter ^

. The simplified form of the matrix is given in Table

6 (page 65) where

and

denote any components of parameter

, for the specific

model, say the NIGN-WW or the NIGN-LL models,

is defined in Table 5 (on page

50). Note that Table 6 shows that in the matrix, [

(k)

)

]

10×10

, only few of elements

are not zero (denoted by

in the table) which can be calculated directly through the

standard likelihood function. In this thesis, the standard likelihood function were given

as either Weibull or log-normal form.

From Table 6, it is easy to see that [

(k)

)

]

10×10

, the second order partial deriva-

tive matrix of Q-function Q(

(k)

) is a symmetric matrix. In fact, the second term in

Table 6: The patten of second order partial derivative matrix of Q(

(k)

)

All these components of

are defined in Table 5 on page 50.

equation (8.20) also denotes a symmetric matrix as it is a summation of a product of

a vector and it's transpose; the third term in the same equation represents a symmet-

ric too, even though two different vectors were involved in the product term. This is

because that the later summation is taken over on i = j only, which means that those

single matrix that satisfied i = j was not included in the summation. Therefore, the

OIM, defined by equation (8.20) is really a symmetric matrix as their three components

are all symmetric matrices.

Application

This section gives demonstrations of using PPO-survival models via the NIGN-WW and

the NIGN-LL models which were used to re-analyse the HIP breast cancer data. Recall

that the background of this data was introduced in Section 2.

9.1

Pre-processing of the HIP breast cancer data

In the re-analysis of the HIP breast cancer data, two types of data subsets were used.

The original HIP data (Shapiro et al., 1988; Baker, 1998) includes 60,696 observa-

tions, in which the enrolment date and the last follow-up date for all individuals had

been recorded, as well as their survival status. The latter indicates whether an individ-

ual died of breast cancer during the observation period or not. Note that in the HIP

breast cancer trial, there were two examiners independently took the responsibility of

determining the course of each single case of death. For those individuals who died but

not due to breast cancer, they were treated as censored individuals in the trial (Shapiro

et al., 1988).

The observed time (from entry to death or last follow-up) was given in days in the

original HIP data. In this thesis, the survival status of an individual is the so-called

censoring indicator (denoted by R = 1 for death from breast cancer, 0 otherwise) and

observed times are counted in years, as in Baker (1998), which was obtained by the

following formula:

observed time (in years) =

days for entry to death or last follow-up

365

(9.1)

We note that 164 individuals in the original HIP data have the same date for their

enrolment and their last follow-up date. This implies that these individuals did not, in

fact, take part in the HIP study and thus should not be counted as "valid " individuals

in the HIP trial. Therefore, in this thesis, we focus on analyzing the subset of the HIP

data (60,696 - 160 = 60,532 women) rather than the original HIP data.

The HIP

dataset:

The first data subset studied follows Shapiro's report on the

HIP study (Shapiro et al., 1988), wherein the maximum observed time is fixed at 18

years. This is because Shapiro et al. (1988) purports that in the HIP study, the maxi-

mum follow-up period is 18 years. Thus, women whose observed times are greater than

18 years (in the original HIP data set) are all preset at 18. Their censoring indicators

are given as 0, as they can be viewed as censored at year 18 (after entry).

The HIP

dataset:

The second data subset is obtained by following the arguments

of Baker (1998). Baker mentions that:

"by seven years, the number of breast can-

cers in the control and screen groups have equalized, so presumably they represent all

breast cancers that could have benefitted from screening. " To reduce the variance of

estimators, Baker (1998) focused only on analyzing those individuals with breast cancer

diagnosed in their first seven years of follow-up. Baker also points out that the choice of

seven years (for the HIP study) was conservative; other authors have used earlier times

of so-called equalisation (Baker, 1998).

As with Baker (1998), we also chose to analyse the HIP data with seven years as

the maximum follow-up period. The benefit from the screening examination should

therefore be considered

"realised "

in seven rather than in eighteen years (Baker,

1998). Note that, as we are interested in investigating how the screening examination

impacts the longevity of women who are in general good health (rather than breast

cancer patients). Hence, unlike Baker (1998), in this thesis, we focus on analysing all

women in the HIP trial rather than those who had been diagnosed with breast cancer

during the observation period.

After truncating the observed time at 7 years, individuals whose observed times

were greater than seven years have been given 0 as their censored indicators value,

which shows that these individuals had not died at that time (7 years). Therefore, in

this HIP dataset, denoted by HIP

, the maximum observed time is 7 years; individuals

whose observed time are greater than 7 years (in the original HIP data) have R

= 0.

This indicates that individuals whose observed time has been truncated (in the HIP

data) did not die from breast cancer during the the follow-up period (7 years after

entry). Summary statistics for the three HIP datasets (the original HIP, HIP

and

HIP

) are given in Table 7 (on page 69).

As stated before, we focused on analysing the two aforementioned HIP data subsets,

namely HIP

and HIP

, with our NIGN-WW and NIGN-LL survival models. The

results are given in Section 9.2. Furthermore, for the purpose of comparison, these two

HIP data subsets are also re-analysed by the Frangakis-Rubin (F-R) model (Frangakis

& Rubin, 1999). To date, the F-R model has not yet been applied to any time-to-event

data in the literature. The results of the re-analysis of the HIP data subsets are given

in Section 9.3.

able

Summary

statistics

HIP

data

sets

Original

HIP

data

otal

Screen

vited

Con

trol

Screen

receiv

Screen

refused

Screen

receiv

Sample

size

60,696

30,131

30,565

20,147

9,984

40,549

Mean

observ

time

¯ Y

14.051

14.1578

13.9457

14.905

12.6499

13.6266

Prop

ortion

¯ R

0.0129

0.0123

0.0135

0.0123

0.0125

0.0133

Prop

ortion

screen

vited

¯ Z

0.4964

0.2462

Prop

ortion

screen

receiv

¯ D

0.3319

0.6686

HIP

data

otal

Screen

vited

Con

trol

Screen

receiv

Screen

refused

Screen

receiv

Sample

size

60,532

30,058

30,474

20,132

9,926

40,400

Mean

observ

time

¯ Y

13.9756

14.0816

13.8709

14.8112

12.6018

13.5591

Prop

ortion

¯ R

0.0126

0.0119

0.0134

0.0119

0.0130

Prop

ortion

screen

vited

¯ Z

0.4966

0.2457

Prop

ortion

screen

receiv

¯ D

0.3326

0.6698

HIP

data

otal

Screen

vited

Con

trol

Screen

receiv

Screen

refused

Screen

receiv

Sample

size

60,532

30,058

30,474

20,132

9,926

4,0400

Mean

observ

time

¯ Y

6.5503

6.5813

6.5197

6.7393

6.2607

6.4561

Prop

ortion

¯ R

0.0030

0.0021

0.0038

0.0020

0.0024

0.0035

Prop

ortion

screen

vited

¯ Z

0.4966

1.0000

0.2457

Prop

ortion

screen

receiv

¯ D

0.3326

0.6698

vitation

for

screening;

receipt

screening

and

death

from

breast

cancer.

9.2

HIP data analysis: via the non-ignorable censoring

mechanism PPO-survival models

For the HIP trial, it is assumed that whether a participant died from breast cancer or not

would not be related to any other participant's treatment status. We can then suppose

Assumption 1 (given in Section 3.2), the SUTVA assumption is satisfied. Assumption

2 (random assignment) clearly holds as all participants in the HIP trial were randomly

assigned to the screening or the control group (Shapiro et al., 1988). Some participants

assigned to the screening group refused to take the screening examination (Shapiro

et al., 1988; Baker, 1998). This implies that these individuals did not follow their

allocated randomised assigned treatment (Z

). Assumption 3, namely that of exclusion

restriction, can then be assumed. This means that the assigned treatment (Z

) affects

the outcome of interest only through the actual received treatment (D

)). We also

suppose Assumption 4, that of monotonicity, holds in the HIP trial, as it is more

acceptable to suppose that no defiers exist than to have defiers in a RCT (Angrist

et al., 1996; Frangakis & Rubin, 1999; Baker, 1998; Yau & Little, 2001; Peng et al.,

2004; O'Malley & Normand, 2005; Mattei & Mealli, 2007). Finally, we suppose that the

latent ignorability assumption holds in the HIP trial, i.e. that the potential failure time

and censoring time are independent, conditional on the latent compliance type. Hence,

all assumptions given in Section 3.2 are assumed satisfied in the HIP trial. In addition,

for all individuals in this trial, there are only two possible types of compliance behavior,

namely compliers or never-takers. Note that no woman assigned to the control group

could receive the screening examination, which implies that no always-takers exist in

the HIP breast screening trial. In other words, the assumption of no access to treatment

for the control (Loeys & Goetghebeur, 2003; Frangakis & Rubin, 1999) holds in the HIP

breast screening trial. However, it is not a necessary assumption for our PPO-survival

models (derived in Section 4).

9.2.1

Model specification for the HIP trial

Since no always-takers exist in the HIP trial, the number of compliance types in the

HIP data has thus been reduced to two, and the number of subgroups of interest in the

HIP data are thus reduced to three (Subgroup j ; j = 1, 2, 3), specified as follows:

Subgroup1 : compliers (U

= c) from the screening group (Z

= 1);

Subgroup2 : compliers (U

= c) from the control group (Z

= 0);

Subgroup3 : never-takers (U

= n) from either the screening (Z

= 1) or the control

= 0) group.

Consequently, for the HIP breast cancer data, the model parameter vector

, given

in equation (5.7), is simplified as follows:

= (, ); where

= (

T,c1

T,c0

T,n

C,c1

C,c0

C,n

and

= (

c,00

(9.2)

Recall that only

need to be estimated using the EM algorithm, is the interest of

model parameter.

Note that the parameters corresponding to the always-takers subgroup in equation

(5.7) are thus excluded in equation (9.2), because no always-takers exist in the HIP trial.

This can be denoted by

= 0, which indicates that the ith individual in the HIP trial,

has 0 (zero) probability of being an always-taker. Therefore, the parameter

c,11

, defined

by the first equation in (5.11), is fixed as 1 (because

= 0), i.e.

c,11

1. So for the

HIP data,

c,11

is clearly not a parameter that needs to be estimated. Furthermore, the

parameter

c,00

, defined by the second equation in (5.11) is equivalent to

, (

c,00

)

because

1 in the case of the HIP study.

Hence, for the NIGN-WW model, which assumes that both failure and censoring

time follow a Weibull distribution, the interest model parameter vector,

W W

, includes

eight elements, i.e.:

W W

= (

T,c1

T,c0

T,n

C,c1

C,c0

C,n

)

(9.3)

where

and

represent the scale parameters of three Weibull distributions for

failure time (T

) associated with the relevant subgroups (Subgroup1, Subgroup2 and

Subgroup3), with

as the common shape parameter for these Weibull distributions;

and

C,c1

C,c0

and

C,n

represent the scale parameters of three Weibull distributions

for censored time (C

) associated with the same three relevant subgroups (Subgroup1,

Subgroup2 and Subgroup3) and with

as the common shape parameter for these

distributions.

When the log-normal is assumed to be the distribution for failure and censoring

time , the interest NIGN-LL model parameter vector, denoted by

, is as follows:

= (

T,c1

T,c0

T,n

C,c1

C,c0

C,n

(9.4)

where

and

denote the mean of the logarithm of failure time, corresponding to

the three extant subgroups (Subgroup1, Subgroup2 and Subgroup3);

is their common

variance (of the logarithm of failure time); and

C,c1

C,c0

C,n

and

represent the

parameters corresponding to the logarithm of censoring time.

The model parameters corresponding to the specific non-ignorable PPO-survival

models (for the HIP trial) are given in Table 8.

Table 8: Non-ignorable PPO-survival model parameters for the HIP trial.

Model name

model parameter vector

NIGN-WW

(

T,c1

T,c0

T,n

C,c1

C,c0

C,n

)

NIGN-LL

(

T,c1

T,c0

T,n

C,c1

C,c0

C,n

)

The estimated values of the interest model parameter vectors

, associated with the

two non-ignorable survival models, namely the NIGN-WW and NIGN-LL models, are

given in Appendix10. Specifically, Tables B-1 and B-2 (on page 118) give analogous

estimates for the HIP

data; Tables B-3 and B-4 (on page 119), for the HIP

data.

Details on the iteration process used in the EM algorithm (Dempster et al., 1977) for

these two specific non-ignorable PPO-survival models are detailed in Appendix

, where

Tables B-5 (on page 120) and B-6 (on page 121) give the iteration of convergence in the

EM algorithm for the HIP

dataset, and Tables B-7 (on page 122) and B-8 (on page

123), for the HIP

dataset.

9.2.2

Estimation of causal effects: CACE(t) and ACE(t)

Table 9: Estimated CACE(t) and associated standard errors per model.

Models

NIGN-WW model

NIGN-LL model

Data

HIP

Years

CACE (SE

CACE

)

CACE ( SE

CACE

)

0.19E-4 (0.10E-4)

0.15E-4 (0.07E-4)

3.56E-4 (1.86E-4)

5.22E-4 (2.19E-4)

12.6E-4 (6.57E-4)

17.9E-4 (7.17E-4)

26.2E-4 (13.7E-4)

32.1E-4 (13.2E-4)

Data

HIP

Years

CACE (SE

CACE

)

CACE (SE

CACE

)

0.66E-4 (0.24E-4)

0.62E-4 (0.25E-4)

14.8E-4 (3.83E-4)

15.7E-4 (4.02E-4)

For causal inference in time-to-event studies, we focus on estimating the causal

effects in terms of CACE(t) and ACE(t).

Tables 9 (on page 73) and 10 (on page 74) give estimated

CACE(t) and ACE(t) for the two non-ignorable PPO survival models (the NIGN-WW

and the NIGN-LL model). Their associated standard errors are also given in Tables 9

and 10, which are calculated by following the formulation given in equations (C-15) and

(C-24).

Table 10: Estimated ACE(t) and associated standard errors per model.

Models

N IGN

- W W model NIGN - LL model

Data

HIP

Years

ACE ( SE

ACE

)

ACE (SE

ACE

)

0.13E-4 (0.07E-4)

0.10E-4 (0.05E-4)

2.38E-4 (1.25E-4)

3.49E-4 (1.47E-4)

8.41E-4 (4.40E-4)

11.6E-4 (4.80E-4)

17.5E-4 (9.19E-4)

21.5E-4 (8.81E-4)

Data

HIP

Years

ACE (SE

ACE

)

ACE (SE

ACE

)

0.44E-4 (0.16E-4)

0.41E-4 (0.16E-4)

9.90E-4 (2.56E-4)

10.5E-4 (2.69E-4)

Since CACE(t) and ACE(t) both are the functions of the MLE and appealing to

large-sample normality of the MLE, the 95% lower and upper confidence limits (LCL

and UCL) of CACE(t) are calculated easily as follows:

LCL

cace

(t) = CACE(t)

- 1.96SE

cace

(t)

U CL

cace

(t) = CACE(t) + 1.96SE

cace

(t).

(9.5)

Similarly, the 95% lower and the upper confidence limits (LCL and UCL) of ACE(t)

are calculated via the following formulae:

LCL

ace

(t) = ACE(t)

- 1.96SE

ace

(t)

U CL

ace

(t) = ACE(t) + 1.96SE

ace

(t).

(9.6)

The associated Z values of CACE(t) and ACE(t) are also given for all analyses. The

Z value is calculated by using the following equations:

- value

cace

(t) =

CACE(t)

- 0

cace

(t)

(9.7)

and

- value

ace

(t) =

ACE(t)

- 0

ace

(t)

;

(9.8)

where 0 is the assumed under the null mean value of CACE(t) or ACE(t). The p-

value derives from 1

- (Z - value) (one tail), where () denotes the standard normal

cumulative probability function.

Further details regarding the estimated ACE(t) and CACE(t), obtained via our

non-ignorable PPO-survival models for the HIP

and HIP

data sets are given in

Appendix10, where Tables B-10, B-11 give the estimated ACE(t) for the HIP

dataset

via the NIGN-WW and NIGN-LL model, respectively, and Tables B-13 and B-14 give

the estimated CACE(t) for the HIP

data set via the same two models. Tables B-17,

B-18, B-19 and B-20 are analogous for the analysis of the HIP

data set.

Based on these estimated values of ACE(t) and CACE(t), the curves of the estimated

CACE(t) and ACE(t) against survival time (years survival post entry) are plotted in

the following figures per data subset analysed. For the the HIP

data, Figures 1 and

2 (on page 76) show the estimated CACE(t) curves obtained using the NIGN-WW and

NIGN-LL model, respectively, and their associated ACE(t) curves are given in Figures 3

and 4 (on page 77); the corresponding curves of estimated CACE(t) and ACE(t) (with

the same names) for the HIP

data are given in Figures 5 and 6 (on page 78) and

Figures 7 and 8 (on page 79).

-2

-1

x 10

-3

Survival time in years

CACE(t)

CACE

nign-ww

for HIP

LCL of CACE

nign-ww

for HIP

UCL of CACE

nign-ww

for HIP

Figure 1: Estimated CACE(t) curve using the NIGN-WW model with 95% con-

fidence interval: HIP

data. The

dash-dot line

and the

dotted line

indicate the

Lower Confidence Limit(LCL) and the Upper Confidence Limit (UCL).

-2

-1

x 10

-3

Survival time in years

CACE(t)

CACE

nign-ll

for HIP

LCL of CACE

nign-ll

for HIP

UCL of CACE

nign-ll

for HIP

Figure 2: Estimated CACE(t) curve using the NIGN-LL model with 95% confi-

dence interval: HIP

data. The

dash-dot line

and the

dotted line

indicate the

Lower Confidence Limit(LCL) and the Upper Confidence Limit (UCL).

-2

-1

x 10

-3

Survival time in years

ACE(t)

ACE

nign-ww

for HIP

LCL of ACE

nign-ww

for HIP

UCL of ACE

nign-ww

for HIP

Figure 3: Estimated ACE(t) curve using the NIGN-WW model with 95% confi-

dence interval: HIP

data. The

dash-dot line

and the

dotted line

indicate the

Lower Confidence Limit(LCL) and the Upper Confidence Limit (UCL).

-2

-1

x 10

-3

Survival time in years

ACE(t)

ACE

nign-ll

for HIP

LCL of ACE

nign-ll

for HIP

UCL of ACE

nign-ll

for HIP

Figure 4: Estimated ACE(t) curve using the NIGN-LL model with 95% confidence

interval: HIP

data. The

dash-dot line

and the

dotted line

indicate the Lower

Confidence Limit(LCL) and the Upper Confidence Limit (UCL).

0.5

1.5

2.5

3.5

4.5

x 10

-3

Survival time in years

CACE(t)

CACE

nign-ww

for HIP

LCL of CACE

nign-ww

for HIP

UCL of CACE

nign-ww

for HIP

Figure 5: Estimated CACE(t) curve with 95% confidence interval: HIP

data

using the NIGN-WW model. The

dash-dot line

and the

dotted line

indicate the

Lower Confidence Limit(LCL) and the Upper Confidence Limit (UCL).

0.5

1.5

2.5

3.5

4.5

x 10

-3

Survival time in years

CACE(t)

CACE

nign-ll

for HIP

LCL of CACE

nign-ll

for HIP

UCL of CACE

nign-ll

for HIP

Figure 6: Estimated CACE(t) curve with 95% confidence interval: HIP

data

using the NIGN-LL model. The

dash-dot line

and the

dotted line

indicate the

Lower Confidence Limit(LCL) and the Upper Confidence Limit (UCL).

0.5

1.5

2.5

3.5

4.5

x 10

-3

Survival time in years

ACE(t)

ACE

nign-ww

for HIP

LCL of ACE

nign-ww

for HIP

UCL of ACE

nign-ww

for HIP

Figure 7: Estimated ACE(t) curve using the NIGN-WW model with 95% confi-

dence interval: HIP

data. The

dash-dot line

and the

dotted line

indicate the

Lower Confidence Limit(LCL) and the Upper Confidence Limit (UCL).

0.5

1.5

2.5

3.5

4.5

x 10

-3

Survival time in years

ACE(t)

ACE

nign-ll

for HIP

LCL of ACE

nign-ll

for HIP

UCL of ACE

nign-ll

for HIP

Figure 8: Estimated ACE(t) curve using the NIGN-LL model with 95% confidence

interval: HIP

data. The

dash-dot line

and the

dotted line

indicate the Lower

Confidence Limit(LCL) and the Upper Confidence Limit (UCL).

9.2.3

Estimation of relevant survival probabilities

The key point distinguishing our non-ignorable PPO-survival models (the NIGN-WW

and NIGN-LL survival model) from conventional parametric survival models, is the fact

that an individual's compliance type is allowed for, as proposed in Chapter 5. Unlike

conventional parametric models, the NIGN-WW and the NIGN-LL models allow us to

estimate the survival probability for three afore-mentioned subgroups in the HIP trial,

namely Subgroup 1, Subgroup 2 and Subgroup 3 as defined in Section 9.2.1.

The estimated survival curves corresponding to these three subgroups (denoted by

(t), S

(t) and S

(t)) for the NIGN-WW and the NIGN-LL model application on the

HIP

and the HIP

subdata are given in Figure 9 (on page 82), where the top row

gives the estimated survival curves evaluated using the NIGN-LL model for the HIP

data (top left) and the HIP

data (top right). The figures located in the bottom

row of Figure 9 represent the corresponding estimated curves for the NIGN-LL model.

In addition, it is not difficult to obtain estimated survival probabilities from an ITT

analytic perspective via our PPO-survival model; Figure 10 (on page 83) shows the

relevant the ITT estimated survival curves. Equation (9.9) gives the formulations of

calculating the ITT survival probabilities S

(t) and S

(t).

(t) = (1

(t) +

(t);

(t) = (1

(t) +

(t).

(9.9)

All estimated values of the relevant survival probabilities, associated with Figure 9,

and Figure 10 are detailed in Tables B-21, B-22, B-25 and B-26 in Appendix10 for the

reader's perusal.

The survival curves plotted on the two graphs at the right-hand side of Figure 9 (on

page 82) correspond to the HIP

dataset via the NIGN-LL model (top graph) and the

NIGN-WW model (bottom graph), respectively. They show that Subgroup 1, which

consists of compliers, who are assigned to the screening (treated) group, has the highest

survival probability among the three subgroups; Subgroup 2 has the lowest survival

probability, in which individuals are all compliers and assigned to the control group;

while the estimated survival curve of Subgroup 3, which consists of all never-takers,

who are assigned to either the screening or the control groups, lies between the curves

of Subgroup 1 and Subgroup 2. These results can be interpreted thus: Subgroup 3

(never-takers) can be viewed as a healthy group, so it has a survival probability higher

than Subgroup 2. Note that it is reasonable to make this assumption as presumably

individuals in Subgroup 3 do not think a screening examination is necessary, for they

refuse to take the screening examination (when assigned to the screening group). The

estimated survival curves for the HIP

dataset also show that the NIGN-LL model and

the NIGN-WW model provide very simliar results.

Unlike the analysis of the HIP

dataset, for the HIP

dataset, the survival curves

estimated using the NIGN-LL model and the NIGN-WW model are slightly different.

The NIGN-WW model provides survival curves for Subgroups 2 and 3 that are closer

than does the NIGN-LL model. However, the estimated survival curve for Subgroup 3

still lies between these two survival curves for Subgroup 1 and 2 (the left-hand graph in

Figure 9). Therefore, the assumption mentioned above, that individuals in Subgroup 3

are all healthy -- or at least thought of themselves as healthy at that time -- is still valid.

Since this is a scenario of long term follow-up, individuals' health status, at enrolment,

may not affect their survival probability in later years. It is noteworthy that in the

analysis of the HIP

dataset, the survival curves of Subgroup 3 are closer to the survival

curves of Subgroup 2 than Subgroup 1. This also makes sense in that individuals, either

in Subgroup 2 or Subgroup 3, have not received a screening examination. It is thus not

surprising to see these two subgroups having similar estimated survival curves. Indeed,

the survival probability in later years is minimally related to the individual's health

status at enrolment.

It is important to note that an ordinary ITT analysis does not provide these nuances

nor this information ( Figure 9, on page 82 and Figure 10 on page 83).

0.98

0.982

0.984

0.986

0.988

0.99

0.992

0.994

0.996

0.998

Survival time in years

Survival probability

(t) in NIGN-LL for HIP

0.98

0.982

0.984

0.986

0.988

0.99

0.992

0.994

0.996

0.998

Survival time in years

Survival probability

(t) in NIGN-LL for HIP

(t) inNIGN-LL for HIP

(t) in NIGN-LL for HIP

0.98

0.982

0.984

0.986

0.988

0.99

0.992

0.994

0.996

0.998

Survival time in years

Survival probability

(t) in NIGN-WW for HIP

(t) inNIGN-WW for HIP

(t) in NIGN-WW for HIP

0.98

0.982

0.984

0.986

0.988

0.99

0.992

0.994

0.996

0.998

Survival time in years

Survival probabilty

(t) in NIGN-WW for HIP

Figure

Estimated

surviv

curv

using

the

NIGN-LL

del

(top

and

the

NIGN-WW

del

ottom

for

the

HIP

(left)

and

HIP

(righ

datasets.

The

solid

line

corresp

onds

the

surviv

curv

for

Subgroup

compliers

assigned

the

screening

group;

the

dash

line

corresp

onds

the

surviv

curv

for

Subgroup

compliers

assigned

the

con

trol

group;

the

dash-dot

line

corresp

onds

the

surviv

curv

for

Subgroup

nev

er-tak

ers

assigned

either

the

con

trol

the

screening

groups.

0.98

0.982

0.984

0.986

0.988

0.99

0.992

0.994

0.996

0.998

Survival time in years

Survival probability

(t) in nign-ll for HIP

0.98

0.982

0.984

0.986

0.988

0.99

0.992

0.994

0.996

0.998

Survival time in years

Survival probability

(t) in nign-ll for HIP

0.98

0.982

0.984

0.986

0.988

0.99

0.992

0.994

0.996

0.998

Survival time in years

Survival probability

(t) in nign-ww for HIP

0.98

0.982

0.984

0.986

0.988

0.99

0.992

0.994

0.996

0.998

Survival time in years

Survival probability

(t) in nign-ww for HIP

Figure

10:

The

ITT

estimated

surviv

curv

using

the

NIGN-LL

del

(top

and

the

NIGN-WW

del

ottom

for

the

HIP

(left)

and

HIP

(righ

datasets.

The

solid

line

corresp

onds

the

surviv

curv

for

the

group

assigned

for

screening

;

the

dash

line

corresp

onds

the

group

not

assigned

for

screening.

9.3

HIP data analysis: via the Frangakis-Rubin method

9.3.1

Frangakis-Rubin method

Let S

(t) denote the survival function for the subgroup with U

= u and Z

= z, and

let S

(t) denote the survival function associated with two treatment levels, z = 1 for

the treatment and 0 for the control. According to equation (5.1) given in Frangakis

& Rubin (1999), the survival functions corresponding to the assigned treatment group

= 1 (for i

{i : Z

= 1

}) and the control group Z

= 0 (for i

{i : Z

= 0

}) can be

re-written as follows:

(t) = (1

(t) +

(t)

(t) = (1

(t) +

(t);

(9.10)

where

is the probability of compliers in the population.

Under the assumption of exclusion restriction, we have S

(t) = S

(t). Using S

(t)

to replace S

(t) and S

(t) in equation (9.10), we then have a form of the Frangakis-

Rubin (F-R) model as follows:

(t) = (1

(t) +

(t);

(t) = (1

(t) +

(t).

(9.11)

Note that S

(t) and S

(t) can be estimated with the Kalan-Meier estimator directly.

Because compliance type U

= u is unobservable, S

(t) the survival function asso-

ciated with the subgroup with U

= c and Z

= 0, cannot be estimated directly with

the Kalan-Meier estimator through observation. Frangakis & Rubin (1999) provide a

formulation for evaluating this, which is given below:

(t) = exp

obs

(x)/N

- dN

(x)/N

- M

(x)/N

;

(9.12)

where N

and N

are the number of individuals randomly assigned to the treated and

the control group respectively, and N

obs

(x) and M

obs

(x) denote the stochastic processes

(for i = 1, . . . , n) and are defined as follows:

(x) := I(T

x, R

= 1);

(x) := I(T

x);

(9.13)

and

(x) =

(x)I(U

= n)I(Z

= 1);

(x) =

(x)I(Z

= 0);

(x) =

(x)I(U

= n)I(Z

= 1);

(x) =

(x)I(Z

= 0).

(9.14)

Given equations (9.13) and (9.14), we note that N

(x) and M

(x) indicate the

number of event occurrences or the number of survivors at or prior to the given time

t for individuals who are never-takers assigned to the treatment group; whereas N

(x)

and M

(x) denote the number of event occurrences or the number of survivors at the

given time t for individuals who are assigned to the control group.

The Frangakis-Rubin (F-R) method (Frangakis & Rubin, 1999) can then be re-

garded as a non-parametric form of the noncompliance and non-ignorable survival po-

tential outcomes model. This is true, because even though the F-R method does not, in

fact, conventionally evaluate CACE(t) and ACE(t), the survival function values, S

(t),

(t) and S

(t), which are required in the calculation of CACE(t) and ACE(t), can

all be obtained by using the F-R model via the Kaplan-Meier estimator and via equa-

tions (9.12) and (9.11). More specifically, S

(t) and S

(t)), the survival probabilities

corresponding to Subgroup 1 and Subgroup 3, can be estimated directly via the Kaplan-

Meier estimator, and S

(t), the survival probability corresponding to Subgroup 2, can

be calculated using equation (9.12) as was proposed by Frangakis & Rubin (1999). Then

(t) and S

(t), the survival probability corresponding to the assigned treatment Z

= 1

and the control Z

= 0 group, respectively, can be obtained directly via equation (9.11),

which was also proposed by Frangakis & Rubin (1999).

The two datasets from the HIP trial, (HIP

and HIP

), are analysed according to

the F-R method, using a programme written in MATLAB (Higham & Higham, 2000).

This code follows equations (9.12) and (9.14) as proposed by Frangakis & Rubin (1999).

9.3.2

Estimation of relevant survival probabilities

0.98

0.982

0.984

0.986

0.988

0.99

0.992

0.994

0.996

0.998

Survival time in years

Survival probability

(t) in F-R for HIP

0.98

0.982

0.984

0.986

0.988

0.99

0.992

0.994

0.996

0.998

Survival time in years

Survival probability

(t) in F-R for HIP

Figure 11: Estimated survival curves via the F-R for the HIP

(left) and for the

HIP

(right) data . The

solid line

corresponds to the survival curve for Subgroup

1: compliers assigned to the screening group; the

dash-dot line

corresponds to the

survival curve for Subgroup 2: compliers assigned to the control group; the

dotted

line

corresponds to the survival curve for the Subgroup 3: never-takers assigned

to either the control or the screening group.

Figure 11 shows the estimated survival curves obtained via the F-R method, cor-

responding to the three subgroups (S

(t), S

(t) and S

(t)) for the two HIP datasets,

namely the HIP

and HIP

subsets, where the left-hand graph corresponds to the

HIP

dataset, and the right-hand one to the HIP

dataset. Both of the graphs show

that Subgroup 1 has a higher survival probability than the other two subgroups. This

is similar to the results given in Section 9.2, which were obtained via the non-ignorable

PPO-survival model. However, we can also see that in Figure 11, the estimated survival

curves for Subgroup 3 are unlike those in Figure 9, especially for the HIP

dataset.

0.98

0.982

0.984

0.986

0.988

0.99

0.992

0.994

0.996

0.998

Survival time in years

Survival probability

(t) in F-R for HIP

0.98

0.982

0.984

0.986

0.988

0.99

0.992

0.994

0.996

0.998

Survival time in years

Survival probability

(t) in F-R for HIP

Figure 12: The ITT estimated survival curves via the F-R for the HIP

(left)

and for the HIP

(right) data . The

solid line

corresponds to the survival curve

for the assigned screening group; the

dash line

corresponds to the survival curve

for the assigned control group.

Figure 11 (left) (on page 86) shows that Subgroup 3 has a higher value than Subgroup

2 before 12 years, after which time (t = 12) Subgroup 3 has the lowest value among

the three subgroups. These results are comprehensible, because the F-R model is a

non-parametric based method, yet at the same time they are harder to interpret than

results obtained via the non-ignorable PPO-survival model (Section 9.2.1).

While Figure 12 gives the estimated survival curves obtained using the F-R method

in an ITT analysis, S

(t) and S

(t), correspond to the assigned treated group Z

= 1

(the screen invited group) and the control group Z

= 0, respectively. All estimated

relevant survival values, which are associated with Figures 11 and 12, are given in Tables

B-23, B-24, B-27 and B-28 in Appendix10.

9.3.3

Estimation of causal effects: CACE(t) and ACE(t)

The estimated values of CACE(t) (obtained by the F-R method) and the associated 95%

confidence intervals (obtained by the bootstrap method (Efron & Tibshirani, 1993)) are

given in Table 11 ( for the HIP

dataset) and Table 12 ( for the HIP

dataset) on page

89. Estimated ACE(t) values are given in Tables 13 and 14 (on page 90). Note that no

parameters are estimated using the F-R model, as this is a non-parametric model. The

formulae given at the end of Section 9.2.2, including equations (9.5), (9.7), (9.6) and

(9.8), are also used to obtain Tables 11, 12, 13, 14, which provide estimated CACE(t)

and ACE(t) values for the F-R method.

Related CACE(t) curves for the HIP

and the HIP

data, estimated by the F-R

method, are given in Figures 13 and 14 (on page 93).

In Tables 12 and 14, it is noteworthy that CACE(t) and ACE(t) for years 4 to 13

(inclusive), are significant in the F-R analyses of the HIP

. This result is similar to

that obtained for the HIP

data, where CACE(t) and ACE(t) for years 4 to 7 are highly

significant (Tables 11 and 13).

Note that the estimated CACE(t) at the first seven years for the HIP

and the

HIP

datasets (Table 12 and Table 11) are slight different. This is because of that

these two datasets are not same even though we only consider the first seven years: 1)

the censoring indicators for individuals whose observed time are seven years in these

two HIP datasets are different. 2) the datasets re-sampled for Bootstrap are high likely

not same. Recall that for the F-R method, the Bootstrap approach is employed for

obtaining the standard errors of the estimated CACE(t) and ACE(t).

Table 11: Estimated CACE(t) (95% CI) using the F-R method for the HIP

data.

Method:

F-R

Data:

HIP

Year

CACE

F R

cace

LCL

cace

U CL

cace

Zvalue

- value

0.57E

-04 1.39E-04 -2.14E-04 3.29E-04 0.4127

0.3399

3.03E

-04 2.43E-04 -1.74E-04 7.80E-04 1.2464

0.1063

5.35E

-04 3.31E-04 -1.14E-04 11.8E-04 1.6159

0.0531

11.0E

-04 4.44E-04

2.32E

-04

19.7E

-04 2.4836

0.0065

18.1E

-04 5.85E-04

6.58E

-04

29.5E

-04 3.0846

0.0010

28.0E

-04 7.08E-04

14.1E

-04

41.9E

-04 3.9505

0.0000

28.0E

-04 7.08E-04

14.1E

-04

41.9E

-04 3.9505

0.0000

HIP

: HIP data with follow-up of 7 years

Table 12: Estimated CACE(t) (95% CI) using the F-R method for the HIP

data.

Method:

F-R

Data:

HIP

Year

CACE

F R

cace

LCL

cace

U CL

cace

Zvalue

- value

0.57E

-04 1.39E-04 -2.14E-04 3.29E-04 0.4127

0.3399

3.03E

-04 2.43E-04 -1.74E-04 7.80E-04 1.2464

0.1063

5.35E

-04 3.31E-04 -1.14E-04 11.8E-04 1.6159

0.0531

11.0E

-04 4.44E-04

2.32E

-04

19.7E

-04 2.4836

0.0065

18.1E

-04 5.85E-04

6.58E

-04

29.5E

-04 3.0846

0.0010

27.7E

-04 7.13E-04

1.38E

-03

41.7E

-04 3.8930

0.0000

25.6E

-04 7.91E-04

1.01E

-03

41.1E

-04 3.2321

0.0006

21.1E

-04 8.68E-04

4.05E

-04

38.1E

-04 2.4272

0.0076

25.8E

-04 9.69E-04

6.80E

-04

44.8E

-04 2.6616

0.0039

21.7E

-04 10.5E-04

1.23E

-04

42.3E

-04 2.0772

0.0189

25.9E

-04 11.3E-04

3.68E

-04

48.0E

-04 2.2849

0.0112

22.2E

-04 12.2E-04 -1.64E-04 46.1E-04 1.8252

0.0340

22.1E

-04 13.0E-04 -3.26E-04 47.5E-04 1.7082

0.0438

15.6E

-04 13.8E-04 -11.3E-04 42.6E-04 1.1349

0.1282

16.5E

-04 15.1E-04 -13.0E-04 46.1E-04 1.0962

0.1365

15.8E

-04 16.3E-04 -16.1E-04 47.6E-04 0.9682

0.1665

17.7E

-04 17.1E-04 -15.8E-04 51.1E-04 1.0353

0.1502

31.8E

-04 18.2E-04 -3.96E-04 67.5E-04 1.7427

0.0407

HIP

: HIP data with follow-up 18 years

Table 13: Estimated ACE(t) (95% CI) using the F-R method for the HIP

data.

Method:

F-R

Data:

HIP

Year

ACE

F R

ace

LCL

ace

U CL

ace

Zvalue

- value

0.38E

-04 0.93E-04 -1.44E-04 2.20E-04 0.4126

0.3399

2.03E

-04 1.63E-04 -1.16E-04 5.23E-04 1.2464

0.1063

3.58E

-04 2.22E-04 -0.76E-04 7.93E-04 1.6160

0.0530

7.38E

-04 2.97E-04

1.56E

-04

13.2E

-04 2.4843

0.0065

12.1E

-04 3.92E-04

4.41E

-04

19.8E

-04 3.0855

0.0010

18.7E

-04 4.74E-04

9.44E

-04

28.0E

-04 3.9507

0.0000

18.7E

-04 4.74E-04

9.44E

-04

28.0E

-04 3.9507

0.0000

HIP

: HIP data with follow-up of 7 years

Table 14: Estimated ACE(t)(95% CI) using the F-R method for the HIP

data.

Method:

F-R

Data:

HIP

Year

ACE

F R

ace

LCL

ace

U CL

ace

Zvalue

- value

0.38E

-04 0.93E-04 -1.44E-04 2.20E-04 0.4126

0.3399

2.03E

-04 1.63E-04 -1.16E-04 5.23E-04 1.2464

0.1063

3.58E

-04 2.22E-04 -0.76E-04 7.93E-04 1.6160

0.0530

7.38E

-04 2.97E-04

1.56E

-04

1.32E

-03 2.4843

0.0065

12.1E

-04 3.92E-04

4.41E

-04

1.98E

-03 3.0855

0.0010

18.6E

-04 4.77E-04

9.23E

-04

2.79E

-03 3.8932

0.0000

17.1E

-04 5.30E-04

6.74E

-04

2.75E

-03 3.2317

0.0006

14.1E

-04 5.81E-04

2.71E

-04

2.55E

-03 2.4268

0.0076

17.3E

-04 6.49E-04

4.55E

-04

3.00E

-03 2.6614

0.0039

14.6E

-04 7.01E-04

0.82E

-04

2.83E

-03 2.0769

0.0189

17.3E

-04 7.58E-04

2.46E

-04

3.22E

-03 2.2844

0.0112

14.9E

-04 8.15E-04 -1.10E-04 3.08E-03 1.8249

0.0340

14.8E

-04 8.68E-04 -2.19E-04 3.18E-03 1.7081

0.0438

10.5E

-04 9.21E-04 -7.60E-04 2.85E-03 1.1350

0.1282

11.1E

-04 10.1E-04 -8.72E-04 3.09E-03 1.0963

0.1365

10.5E

-04 10.9E-04 -10.8E-04 3.19E-03 0.9683

0.1665

11.8E

-04 11.4E-04 -10.6E-04 3.42E-03 1.0354

0.1502

21.3E

-04 12.2E-04 -2.66E-04 4.52E-03 1.7426

0.0407

HIP

: HIP data with follow-up of 18 years

9.4

Model comparisons

In Sections 9.2

and 9.3, we analysed the HIP data using three models, namely the

NIGN-WW model, the IGN-W model and the F-R method. In this section, we focus

on comparing these three models using the results of analysis of both the HIP

and

HIP

datasets.

We first compare the non-ignorable PPO-survival model and the F-R method (Sec-

tion 9.4.1). Secondly, we compare the ignorable PPO-survival model and the non-

ignorable PPO-survival model (Section 9.4.2).

9.4.1

Non-ignorable PPO-survival model vs the F-R method

Since the non-ignorable PPO-survival model is developed under the same assumptions

as those for the F-R method, it is reasonable to expect that the results from these two

approaches would be comparable. However, it also would not be surprising to see some

differences between these results on the HIP data; as indeed, these two approaches derive

from different methodologies. Recall that the non-ignorable PPO-survival model is

derived from a parametric perspective through the maximum likelihood method, whilst

the F-R method is a nonparametric approach which uses the Kaplan-Meier estimator

(p.84, Klein (1997)).

The results from the NIGN-WW and the NIGN-LL model are comparable to those

obtained by the Frangakis-Rubin (F-R) method, as applied to the HIP

data (Figure

15, on page 94). This was also observed after an analogous analysis of the HIP

data

(Figure 17, on page 96).

Since the F-R method derives from a nonparametric perspective, it would not be

surprising to see, however, some fluctuation in the F-R specific plot of CACE(t) (shown

on the left hand side of Figure 17 for the HIP

dataset). Note, however, from the

left hand side plots (Figure 15), that little fluctuation is evident in the F-R specific

CACE(t) plot for the HIP

dataset. This is because indeed the period of follow-up is

shortened from 18 to 7 years.

In addition to the results mentioned above, we have shown that the non-ignorable

PPO-survival models provide more precise estimators than those obtained by the F-R

method (Figures 15, and 17), evident by the discrepancy in width of the associated

confidence intervals.

0.5

1.5

2.5

3.5

4.5

x 10

-3

Survival time in years

CACE(t)

CACE

for HIP

LCL of CACE

for HIP

UCL of CACE

for HIP

Figure 13: Estimated CACE(t) using the F-R method (with 95% confidence in-

terval): HIP

data. The

dash-dot line

and the

dotted line

indicate the Lower

Confidence Limit(LCL) and the Upper Confidence Limit (UCL).

-2

-1

x 10

-3

Survival time in years

CACE(t)

CACE

for HIP

LCL of CACE

for HIP

UCL of CACE

for HIP

Figure 14: Estimated CACE(t) using the F-R method (with 95% confidence in-

terval): HIP

data. The

dash-dot line

and the

dotted line

indicate the Lower

Confidence Limit (LCL) and the Upper Confidence Limit (UCL).

0.5

1.5

2.5

3.5

4.5

x 10

-3

Survival time in years

CACE(t)

CACE

for HIP

LCL of CACE

for HIP

UCL of CACE

for HIP

0.5

1.5

2.5

3.5

4.5

x 10

-3

Survival time in years

CACE(t)

CACE

nign-ww

for HIP

LCL of CACE

nign-ww

for HIP

UCL of CACE

nign-ww

for HIP

0.5

1.5

2.5

3.5

4.5

x 10

-3

Survival time in years

CACE(t)

CACE

nign-ll

for HIP

LCL of CACE

nign-ll

for HIP

UCL of CACE

nign-ll

for HIP

Figure

15:

Estimated

)

curv

with

95%

confidence

terv

al:

HIP

data

using

the

F-R

metho

(left),

the

NIGN-

del

(middle)

and

the

NIGN-LL

del

(righ

t).

The

dash-dot

line

and

dotted

line

indicate

the

Confidence

Limit(LCL)

and

the

Upp

Confidence

Limit

(UCL).

0.5

1.5

2.5

3.5

4.5

x 10

-3

Survival time in years

ACE(t)

ACE

for HIP

UCL of ACE

for HIP

LCL of ACE

for HIP

0.5

1.5

2.5

3.5

4.5

x 10

-3

Survival time in years

ACE(t)

ACE

nign-ww

for HIP

LCL of ACE

nign-ww

for HIP

UCL of ACE

nign-ww

for HIP

0.5

1.5

2.5

3.5

4.5

x 10

-3

Survival time in years

ACE(t)

ACE

nign-ll

for HIP

LCL of ACE

nign-ll

for HIP

UCL of ACE

nign-ll

for HIP

Figure

16:

Estimated

)

curv

with

95%

confidence

terv

al:

HIP

data

using

the

F-R

metho

(left),

the

NIGN-

del

(middle)

and

the

NIGN-LL

del

(righ

t).

The

dash

line

and

dotted

line

indicate

the

Confidence

Limit(LCL)

and

the

Upp

Confidence

Limit

(UCL).

-2

-1

x 10

-3

Survival time in years

CACE(t)

CACE

for HIP

LCL of CACE

for HIP

UCL of CACE

for HIP

-2

-1

x 10

-3

Survival time in years

CACE(t)

CACE

nign-ww

for HIP

LCL of CACE

nign-ww

for HIP

UCL of CACE

nign-ww

for HIP

-2

-1

x 10

-3

Survival time in years

CACE(t)

CACE

nign-ll

for HIP

LCL of CACE

nign-ll

for HIP

UCL of CACE

nign-ll

for HIP

Figure

17:

Estimated

)

curv

with

95%

confidence

terv

al:

HIP

data

using

the

F-R

metho

(left),

the

NIGN-WW

del

(middle)

and

the

NIGN-LL

del

(righ

t).

The

dash-dot

line

and

dotted

line

indicate

the

Confidence

Limit(LCL)

and

the

Upp

Confidence

Limit

(UCL).

-2

-1

x 10

-3

Survival time in years

ACE(t)

ACE

for HIP

UCL of ACE

for HIP

LCL of ACE

for HIP

-2

-1

x 10

-3

Survival time in years

ACE(t)

ACE

nign-ww

for HIP

LCL of ACE

nign-ww

for HIP

UCL of ACE

nign-ww

for HIP

-2

-1

x 10

-3

Survival time in years

ACE(t)

ACE

nign-ll

for HIP

LCL of ACE

nign-ll

for HIP

UCL of ACE

nign-ll

for HIP

Figure

18:

Estimated

)

curv

with

95%

confidence

terv

al:

HIP

data

using

the

F-R

metho

(left),

the

NIGN-

del

(middle)

and

the

NIGN-LL

del

(righ

t).

The

dash

line

and

dotted

line

indicate

the

Confidence

Limit(LCL)

and

the

Upp

Confidence

Limit

(UCL).

9.4.2

The NIGN-WW model vs the IGN-W model

0.5

1.5

2.5

3.5

x 10

-3

Survival time in years

CACE(t)

CACE

nign-ll

for HIP

CACE

ign-w

for HIP

CACE

nign-ww

for HIP

Figure 19: Comparisons of estimated CACE(t) curves using three models for the

HIP

data. The

dash-dot line

shows the IGN-W model, the

dotted line

shows

the NIGN-LL model and the

solid line

shows the NIGN-WW mdoel.

Figure 19 gives three model-specific curves of CACE(t) for the HIP

data, including

the CACE(t) curve that estimated by the IGN-W model. It shows that the CACE(t)

curve corresponding to the IGN-W model is farther from the two curves associated with

the NIGN-WW model and the NIGN-LL model. This implies that the IGN-W model

may overestimate CACE(t) on the HIP

data. Hence, in the HIP

data, we need to

assume latent ignorability; in which case, HIP

should be regarded as the special case

of a non-ignorable censoring mechanism. Furthermore, Figure 19 shows that the two

specific models from the non-ignorable PPO-survival model provide very similar values

for CACE(t) at each given time (in years). This provides strong evidence that the non-

ignorable-survival model is a robust model, in that it is not sensitive to the assumed

distribution. Column 2 of Table 15 gives the estimated CACE

(for the HIP

data)

According the definition given by Imbens & Rubin (1997), in this case, CACE can be

expressed as CACE =

where

and

denote the mean values of observed failure

corresponding to the treated and the control groups among compliers (U

= c).

Table 15: Comparisons between CACE and CACE(t) in two models: HIP

Model

CACE

CACE(1)

CACE(5)

AIC =

-2ln(L()) + 2m

NIGN-WW

52.64

0.66E-04

14.8E-04

2.0949E+05

NIGN-LL

1312.8

0.62E-04

15.7E-04

2.8393E+05

with respect to failure time T, which can be expressed as

CACE = E(T

(1)

= 1, U

= c)

- E(T

(0)

= 0, U

= c).

The values of CACE obtained by using the NIGN-WW model and the NIGN-LL model

were 52.64 vs 1312.8. They are clearly showing that the estimator CACE strongly re-

lies on the assumed distribution; while Column 3 and 4 of Table 15 give the estimated

CACE(t)

for t = 1 and t = 5 on the HIP

dataset via the two aforementioned mod-

els, namely 0.66E

- 045 vs 0.62E - 04 (for t = 1) and 14.8E - 04 vs 15.7E - 04 (for

t = 5). These results provide evidence that inferences for the specific complier average

causal effect CACE(t) (for time-to-event data), are not as sensitive to distributional as-

sumptions as inference for CACE. Indeed, by comparing their AIC (Akaike Information

criterion

) for the HIP

data, the Weibull distribution (AIC

nign

-ww

= 2.0948E + 5)

fits better than the log-normal (AIC

nign

-ll

= 2.8392E + 5) distribution for the non-

ignorable PPO-survival model. But both models give similar results (Figures 5 and 6

on page 78).

On the other hand, estimated survival values for the HIP

and HIP

data, using

the IGN-W model (given on the right-hand graphs in Figure 20 on page 101 and Figure

22 on page 103), also show that the IGN-W model does not provide as reasonable results

as the non-ignorable PPO-survival models, (the NIGN-WW and the NIGN-LL model).

According to the estimated survival values for the three subgroups using the IGN-W

By the new definition which is given in equation (3.7)

A measure of the goodness fit of an estimated statistical model, AIC =

-2ln(L()) + 2m

where m is the number of parameters and ln(L()) is the maximum nature logarithm likelihood

function for the estimated model. The one with lowest AIC is the best model.

model, Subgroup 3 is the highest among the three subgroups, which is not consistent

with the results provided via the F-R method and the non-ignorable PPO-survival

models. This is hard to interpret reasonably.

In fact, both the NIGN-WW and the NIGN-LL models perform much better than

the IGN-W model (Figure 22 on page 103) and Figure 20 on page 101).

Figures 20 (on page 101) and 22 (on page 103) provide strong evidence to support

our argument, that in the case of both the HIP

and HIP

datasets, the NIGN-WW

model provides more precise results than does the IGN-W model. With respect to

the F-R and the NIGN-WW models, Subgroup 1 (individuals who took the screening

examination) has a higher survival probability than subgroups where individuals did not

take the screening examination. This shows a clear and significant treatment benefit of

breast screening.

100

0.98

0.982

0.984

0.986

0.988

0.99

0.992

0.994

0.996

0.998

Survival time in years

Survival probability

(t) in F-R for HIP

0.98

0.982

0.984

0.986

0.988

0.99

0.992

0.994

0.996

0.998

Survival time in years

Survival probability

(t) in NIGN-WW for HIP

(t) inNIGN-WW for HIP

(t) in NIGN-WW for HIP

0.98

0.982

0.984

0.986

0.988

0.99

0.992

0.994

0.996

0.998

Survival time in years

Survival probability

(t) in IGN-W for HIP

Figure

20:

Estimated

surviv

curv

using

the

F-R

(left),

the

NIGN-WW

(middle)

and

the

IGN-W

(righ

dels

for

the

HIP

data.

The

solid

line

corresp

onds

the

surviv

curv

for

Subgroup

compliers

assigned

the

screening

group;

the

dash

line

corresp

onds

the

surviv

curv

for

Subgroup

compliers

assigned

the

con

trol

group;

the

dash-dot

line

corresp

onds

the

surviv

curv

for

the

Subgroup

nev

er-tak

ers

assigned

either

the

con

trol

the

screening

group.

101

0.98

0.982

0.984

0.986

0.988

0.99

0.992

0.994

0.996

0.998

Survival time in years

Survival probability

(t) in F-R for HIP

0.98

0.982

0.984

0.986

0.988

0.99

0.992

0.994

0.996

0.998

Survival time in years

Survival probability

(t) in nign-ww for HIP

0.98

0.982

0.984

0.986

0.988

0.99

0.992

0.994

0.996

0.998

Survival time in years

Survival probability

(t) in ign-w for HIP

Figure

21:

ITT

estimated

surviv

curv

using

the

F-R

(left),

the

NIGN-WW

(middle)

and

the

IGN-W

(righ

dels

for

the

HIP

data.

The

solid

line

corresp

onds

the

group

assigned

for

screening;

the

dash

line

corresp

onds

the

group

not

assigned

for

screening.

102

0.995

0.9955

0.996

0.9965

0.997

0.9975

0.998

0.9985

0.999

0.9995

Survival time in years

Survival probability

(t) in F-R for HIP

0.995

0.9955

0.996

0.9965

0.997

0.9975

0.998

0.9985

0.999

0.9995

Survival time in years

Survival probabilty

(t) in NIGN-WW for HIP

0.995

0.9955

0.996

0.9965

0.997

0.9975

0.998

0.9985

0.999

0.9995

Survival time in years

Survival probability

(t) in IGN-W for HIP

Figure

22:

Estimated

surviv

curv

using

the

F-R

(left),

the

NIGN-WW

(middle)

and

the

IGN-W

(righ

dels

for

the

HIP

data.

The

solid

line

corresp

onds

the

surviv

curv

for

Subgroup

compliers

assigned

the

screening

group;

the

dash-dot

line

corresp

onds

the

surviv

curv

for

Subgroup

compliers

assigned

the

con

trol

group;

the

dotted

line

corresp

onds

the

surviv

curv

for

Subgroup

nev

er-tak

ers

assigned

either

the

con

trol

the

screening

group.

103

0.995

0.9955

0.996

0.9965

0.997

0.9975

0.998

0.9985

0.999

0.9995

Survival time in years

Survival probability

(t) in F-R for HIP

0.995

0.9955

0.996

0.9965

0.997

0.9975

0.998

0.9985

0.999

0.9995

Survival time in years

Survival probability

(t) in nign-ww for HIP

0.995

0.9955

0.996

0.9965

0.997

0.9975

0.998

0.9985

0.999

0.9995

Survival time in years

Survival probability

(t) in ign-w for HIP

Figure

23:

ITT

estimated

surviv

curv

using

the

F-R

(left),

the

NIGN-WW

(middle)

and

the

IGN-W

(righ

dels

for

the

HIP

data.

The

solid

line

corresp

onds

the

group

assigned

for

screening;

the

dash

line

corresp

onds

the

group

not

assigned

for

screening.

104

9.5

Results of the analysis of the HIP data via the non-

ignorable PPO-survival model

Table 9 (on page 73) shows the estimated complier average causal effect CACE(t) and

Table 10 (on page 74) shows the estimated average causal effect ACE(t) obtained via

the NIGN-WW and the NIGN-LL models. (Further details are given in Tables B-19,

B-20, B-13 and B-14 in Appendix 10). From these tables, we can see that CACE(t) and

ACE(t) are completely positive at each given observed year. This implies that a woman

undertaking screening would have a higher probability of surviving breast cancer than

she would without having been screened.

9.5.1

Estimated CACE(t) for the HIP

data

More specifically, from the NIGN-WW model on the HIP

data, CACE(5) = 14.8E

-04

and SE

cace(5)

= 3.83E

- 04 (Table 9 on page 73 or Table B-19 on page 132). This indi-

cates that after five years, a woman has about 0.148% more probability (with 0.0383%

standard error) of survival than otherwise. This can also be interpreted as follows: out

of 10,000, women about 14.8 (

±3.83) women would be saved from breast cancer-related

death five years after screening.

The result from the NIGN-LL model is similar, in that CACE(5) = 15.7E

- 04

(SE

cace(5)

= 4.02E

- 04) (Table 9 on page 73 or Table B-20 on page 132). This is

equivalent to about 15.7 (

±4.02) women in 10,000 being saved from breast cancer-

related death five years after screening.

9.5.2

Estimated CACE(t) for the HIP

data

For the HIP

data, the results show that in 10,000 women, about 3.56 (

±1.86) women

would be saved from breast cancer-related death five years after screening, as the com-

plier average causal effect (CACE(t)) with its associated standard error estimated (via

the NIGN-WW model) is CACE(5) = 3.56E

- 04 (SE

cace(5)

= 1.86E

- 04) (Table 9

105

on page 73 or Table B-13 on page 128). On the other hand, the CACE(t) obtained

via the NIGN-LL model is CACE(5) = 5.22E

- 04 (SE

cace(5)

= 2.19E

- 04) (Table 9

on page 73 or Table B-14 on page 129), which is equivalent to stating that about 5.22

(

±2.19) women in 10,000 are being saved from breast cancer-related death five years

after screening.

Even though the estimated values of CACE(5) from these two specific non-ignorable

PPO-survival models are different, they all tend to support the conclusion that women

(over 40 years), who are screened would have a significantly reduced risk of breast

cancer-related death.

By comparing the estimated complier average causal effects at the first year (CACE(1))

in the HIP trial, it is easy to see that CACE(1) < CACE(5) (Table 9 on page 73) for

the two specific non-ignorable PPO-survival models. More details are given in Tables

B-19 , B-20, B-13 and B-14 in Appendix 10. These results indicate that out of 10,000

women, the number of women saved from breast cancer related-death at the first year

is significantly less than at the fifth year after screening. For the HIP

dataset, the

number of women saved from breast cancer related-death at the first year is 0.66(

±0.24)

vs 14.8(

±3.83) the number of women being saved at the fifth year (using the NIGN-WW

model); with the NIGN-LL model, it is 0.62(

±0.25) vs 15.7(±4.02).

This provides strong evidence to support, what is nowadays, accepted as com-

mon sense: the earlier breast cancer is detected, (and the earlier it is treated), the

greater a woman's chance of survival. This conclusion is consistent with that reported

by Shapiro et al. (1988). Mammography screening is now commonplace in the US

(www.radiologyinfo.org), Australia (www.health.gov.au) and New Zealand (www.nzbcf.org.nz).

However, the results for the HIP

and HIP

datasets (Table 9 on page 73) obtained

using the NIGN-WW and the NIGN-LL models are different. One needs to make a

decision as to which model is closer to the truth. As we stated in section 9.2.3, in a

scenario of long-term follow-up, individuals's health status at beginning may not affect

their survival probability in later years. Especially, Figure 11 (on page 86) shows that

106

around 12 years it appears a crossing of the survival curves corresponding to Subgroup

2 S

(t) and Subgroup 3 S

(t). This implies that for the HIP trial, the non-ignorable

PPO-Survival models, (e.g. the NIGN-WW and the NIGN-LL models) would perform

better in analysing a short term follow-up data (the HIP

) than a long-term follow-up

data (the HIP

Furthermore, by comparison with the F-R method, which gives CACE(5) = 18.1E

04 (SE

cace(5)

= 5.85E

- 04) for both the HIP

and HIP

datasets (Table 9 on page

73), we believe that the non-ignorable PPO-survival model is optimal for the HIP

data than the HIP

data, because the results of the HIP

data obtained by the non-

ignorable PPO-survival model, CACE(5) = 14.8E

- 04 (SE

cace(5)

= 3.83E

- 04) (in

the NIGN-WW model) and CACE(5) = 15.7E

- 04 (SE

cace(5)

= 4.02E

- 04) (in the

NIGN-LL model), are much more closer to those obtained via the F-R method than the

results of the HIP

data, which are CACE(5) = 3.56E

- 04 (SE

cace(5)

= 1.86E

- 04)

(in the NIGN-WW model) and CACE(5) = 5.22E

- 04 (SE

cace(5)

= 2.19E

- 04) (in

the NIGN-LL model).

Theoretically, the non-ignorable PPO-survival model and the F-R method are com-

parable, as they are both derived under the same assumptions. The difference between

them is only that the former is a parametric survival model, while the latter is a non-

parametric survival model. Table 16 (on page 108) shows the results of the HIP

data from the non-ignorable and F-R models. Clearly, the non-ignorable PPO-survival

models provide significantly smaller standard errors for CACE(t) compared to the F-R

method.

Conclusions and Discussion

This paper presents a likelihood-based approach to the problem of inference for causal

effects in randomized time-to-event studies with non-compliance and possibly non-

ignorable censoring. Our approach can be viewed as a parametric, likelihood based

107

Table 16: Results of the analysis of the HIP

data via the non-ignorable PPO sur-

vival model. Numbers indicate the number of increased survivals due to screening

per 10,000 (on HIP

data). The associated standard error is given in brackets.

Survival year

NIGN-WW model

NIGN-LL model

F-R method

0.66(

±0.24)

0.62(

±0.24)

0.57 (

±1.39)

14.8(

±3.83)

15.7(

±4.02)

18.1 (

±5.85)

alternative to Frangakis & Rubin (1999)'s non-parametric method of moments estima-

tor (section 5) and extends the likelihood methods of O'Malley & Normand (2005) to

the case of time to event studies, and also given a new definition of ACE and CACE

for time-to-event data, which is the extension of Peng et al. (2004), the latter is based

on general location models.

Likelihood theory guarantees consistency of point estimates, as is also the case for

Frangakis & Rubin (1999) moment estimates. In the context of normally distributed

outcomes, the simulation study of O'Malley & Normand (2005) suggests practical, finite

sample, advantages for likelihood based methods in terms of efficiency, mean-squared

error and confidence interval coverage, compared to moment estimation. In O'Malley

& Normand (2005)'s study these advantages persisted even when the distributional

assumptions of the likelihood construction were incorrect.

The fundamental idea underpinning this paper and previous work on causal infer-

ence under non-compliance and subsequent missing outcomes (Frangakis & Rubin, 1999;

O'Malley & Normand, 2005) is that adjustment for bias related to post-randomization

self-selection of treatment, can be achieved by the introduction of a compliance type

covariate which characterizes treatment received under all possible treatment assign-

ments. Censoring is assumed to be ignorable conditional on compliance type (latent

ignorability), but because this covariate is only partially observed the assumed censor-

ing mechanism is, in general, not ignorable given the observable data. This leads to a

more complex likelihood construction than in standard time to event analyzes because

108

the model for the censoring time does not drop out of the likelihood. As a consequence,

the censoring distribution needs to be explicitly modeled. While the assumption of

latent ignorability cannot be verified from observed data we agree with Frangakis &

Rubin (1999) that because compliance type may be associated with loss to follow-up

ignorability of the censoring mechanism conditional on compliance type will generally

be a more plausible assumption than unconditional ignorability. We note, however, that

in some randomized time to event studies with non-compliance, it will be reasonable to

ignore the censoring mechanism. For example in some studies, particularly those with

short follow-up time it is possible that censoring is caused solely by the conclusion of

the study. In which case the censoring time is effectively a known constant for each in-

dividual (end of study date minus entry date) and therefore does not enter the analysis

Cox & Oakes (1984); Klein (1997).

Our approach permits both estimation of the causal effect of treatment received and

the causal effect of treatment assignment. The former quantifies treatment efficacy in the

ideal case where all individuals assigned treatment actually comply with the treatment

programme. In the case of a drug trial this provides information on the biological effect

of the drug by contrasting outcomes for a group of compliers given the new treatment

with outcomes for that same group had they been give an standard treatment. On the

other hand the causal effect of assigned treatment quantifies the overall population effect

of introducing a new treatment regime or public health initiative, such as a screening

programme. The overall effectiveness of a such a programme will reflect both the efficacy

of the procedure and the proportion of the target population who take up the treatment

and these two concepts are combined in the paper. This is reflected in the definition

of the average causal effect of assigned treatment given in (3.8) under the compound

exclusion and monotonicity assumptions Frangakis & Rubin (1999) show that when

censoring time is associated with compliance type the usual intention to treat estimators,

based on comparison of Kaplan-Meier survival curves for the randomized and non-

randomized groups is inconsistent for the causal effect of assigned treatment. In contrast

109

their method of moments estimator and our likelihood based method provide consistent

estimates of this effect.

Although the issue of non-compliance with assigned treatment shifts the analysis

of randomized studies towards the analysis of observational studies, the randomization

of assigned treatment gives randomized studies a distinct advantage over observational

studies because it ensures that the distribution of compliance type can be assumed to

be the same in each treatment arm Together with plausible assumptions such as the ex-

clusion assumption this ensures that the observed study data are informative about the

distribution of compliance type The observational study counterpart to non-compliance

in randomized studies is the problem of confounding by unobserved covariates and be-

cause the observed study data are uninformative about unobserved covariates, causal

inference in the observational study setting can be highly sensitive to prior assump-

tions about the adequacy of control for confounding variables Greenland et al. (1999);

Greenland (2004).

Because the likelihood approach requires explicit modelling of both censoring time

and survival time distributions, one potential drawback to this approach is the potential

sensitivity of causal inferences to distributional assumptions.

In a specific application sensitivity can be explored by fitting the model under dif-

ferent distributional assumptions and the usual sorts of likelihood based model selection

criteria can be applied (AIC, BIC). The impact of distributional assumptions could be

lessened by developing a semi-parametric model in which both the survival time and

censoring time are modelled by Cox type proportional hazards models. As in this paper,

likelihood maximization for such a semi-parametric approach would be facilitated by an

E-M algorithm which exploits the conditional independence of the survival and censoring

time given the compliance type variable. Alternatively, moving to a Bayesian perspec-

tive when the likelihood construction in Section 4,is combined with a prior distribution

over the model parameters, posterior computation via a Gibbs sampler, alternating be-

tween imputation of unobserved compliance types and sampling of parameters from the

110

conditional posterior distribution, given the compliance states, is suggested. Cox type

models can be fitted within a Bayesian framework Gong et al. (2005) and therefore both

fully parametric and semi-parametric Bayesian modelling should prove feasible using the

general likelihood construction of section 4.

ACKNOWLEDGEMENT

The first author would like to thank the following organizations and persons: the

New Zealand Institute of Mathematics and its Application (NZIMA) for providing

her three year Ph.D. scholarship to do this research. the New Zealand Statistics

Association (NZSA) and the Department of Mathematics and Statistics for also

sponsoring her to attend relevant conferences.

Dr. Stuart G. Baker and his

assistant Diane Erwin, National Cancer Institute of USA, for kindly organising

the HIP data for this research.

111

References

Angrist, J. D., Imbens, G. W., & Rubin, D. B. (1996). Identification of causal

effects using instrumental variables. Journal of the American Statistical Asso-

ciation, 91(434), 444455.

Baker, J. G., Rounds, J. B., & Zevon, M. A. (2000). A comparison of graded

response and rasch partial credit models with subjective well-being. Journal of

Educational and Behavioral Statistics, 25(3), 253270.

Baker, S. G. (1992). A simple method for computing the observed information

matrix when using the EM algorithm with categorical data. The American

Statistical Association, 1(1), 6367.

Baker, S. G. (1994). Gender, ethnicity, and homelessness - accounting for demo-

graphic diversity on the streets. American Behavioral Scientist, 37(4), 476504.

Baker, S. G. (1997). All-or-none compliance. In S. Kotz, C. Read, & D. Banks

(Eds.), The Encyclopedia of Statistical Sciences, volume 1 (pp. 134138). New

York: Wiley.

Baker, S. G. (1998).

Analysis of survival data from a randomised trial with

all-or-none compliance: Estimating the cost-effectiveness of a cancer screening

program. Journal of the American Statistical Association, 93(443), 929934.

Baker, S. G. & Kramer, B. S. (2005). Simple maximum likelihood estimates of

efficacy in randomised trials and before-and-after studies, with implications for

meta-analysis. Statistical Methods in Medical Research, 14, 349367.

Barnard, J., Frangakis, C. E., Hill, J. L., & Rubin, D. B. (2003). Principal

stratification approach to broken randomised experiments: A case study of

112

School Choice vouchers in New York City. Journal of the American Statistical

Association, 98(462), 299311.

Casella, G. & Berger, R. L. (2002). Statistical inference. Australia ; Pacific Grove,

CA: Thomson Learning, 2nd edition.

Cox, D. & Oakes, D. (1984). Analysis of Survival Data. Monographs on Statistics

and Applied Probability. London; New York: Chapman and Hall.

Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from

incomplete data via EM algorithm. Journal of the Royal Statistical Society

Series B-Methodological, 39(1), 138.

Efron, B. & Tibshirani, R. (1993). An Introduction to the Bootstrap. New York:

Chapman& Hall.

Fisher, L. D. (1999). Advances in clinical trials in the twentieth century. Annu.

Rev. Public Health, 20, 109124.

Fleming, T. & Lin, D. (2000). Surivival analysis in clinical trials: Past develop-

ments and future directions. Biometrics, 56, 971983.

Frangakis, C. E. & Rubin, D. B. (1999). Addressing complications of intention-to-

treat analysis in the combined presence of all-or-none treatment-noncompliance

and subsequent missing outcomes. Biometrika, 86(2), 365379.

Frangakis, C. E. & Rubin, D. B. (2002). Principal stratification in causal inference.

Biometrics, 58(1), 2129.

Gong, Z., Hudson, I., & Graham, P. (2005). A Bayesian Causal Cox Model (CCM)

for all-or-nothing compliance. In A. Francis, K. Matawie, A. Oshlack, & G.

113

Smyth (Eds.), 20th International Workshop in Statistical Modelling, Sydney,

Australia (pp. 227230).

Graham, P. (2000). Bayesian inference for a generalized population attributable

fraction: the impact of early vitamin A levels on chronic lung disease in very

low birth weight infants. Statistics in Medicine, 19, 937956.

Graham, P. (2001). Bayesian Causal Inference for Epidemiological and Clinical

Studies. Phd thesis, Otago, NZ.

Greenland, S. (2004). An overview of methods for causal inference from observa-

tional studies. In A. Gelman & X. L. Meng (Eds.), Applied Bayesian Modeling

and Causal Inference from Incomplete-Data Perspectives. John Wiley and Sons.

Greenland, S., Robins, J. M., & Pearl, J. (1999). Confounding and collapsibility

in causal inference. Statistical Science, 14(1), 2946.

Herring, A. H., I. J. G. & Lipsitz, S. (2004). Non-ignorable missing covariate data

in survival analysis: A case-study of an international breast cancer study group

trial. Application of Statistics, 53(Part 2), 293310.

Higham, D. & Higham, N. (2000). MATLAB Guide. SIAM.

Hirano, K. & Imbens, G. W. (2000). Assessing the effect of an influenza vaccine

in an encouragement design. Biometrics, 1, 6988.

Holland, P. (1986). Statistics and causal inference. Journal of the American

Statistical Association, 81(396), 945970.

Ibrahim, J. G., Chen, M. H., & Sinha, D. (2001). Bayesian Survival Analysis.

New York: Springer-Verlag.

114

Imbens, G. & Rubin, D. (1997). Bayesian inference for causal effects in randomised

experiments with noncompliance. The Annals of Statistics, 25(1), 305327.

Jo, B. (2002). Statistical power in randomised intervention studies with noncom-

pliance. Psychological Methods, 7(2), 178193.

Kalbfleisch, J. D. & Prentice, R. L. (2002). The Statistical Analysis of Failure

Time Data. New York: Wiley, 2nd edition.

Klein, J. P. (1997). Survival Analysis. Statistics for Biology and Health. New

York: Springer-Verlag, 1st edition.

Lagarias, J. G., Reeds, J. A., Wright, M. H., & Wright, P. E. (1998). Convergence

properties of the Nelder-Mead simplex method in low dimensions. Society for

Industrial and Applied Mathematics, 9(1), 112147.

Laird, N. M. & Louis, T. A. (1982). Approximate posterior distributions for

incomplete data. Journal of Royal Statistical Society , Series B, 44(2), 190

200.

Levy, D. E., O'Malley, A. J., & Normand, S. T. (2004). Covariate adjustment in

clinical trials with non-ignorable missing data and non-compliance. Statistics

in Medicine, 23, 23192339.

Loeys, T. & Goetghebeur, E. (2003). A causal proportional hazards estimator

for the effect of treatment actually received in a randomised trial with all-or-

nothing compliance. Biometrics, 59(1), 100105.

Louis, T. A. (1982). Finding the observed information matrix when using the EM

algorithm. Journal of the Royal Statistical Society, Series B., 44(2), 226233.

115

Mattei, A. & Mealli, F. (2007). Application of principal stratification approach

to the Faenza randomised experiment on breast self-examination. Biometrics,

63(June), 437446.

McLachlan, G. J. & Krishnan, T. (1997). The EM Algorithm and Extensions.

New York: Wiley.

Mealli, F., Imbens, G. W., Ferro, S., & Biggeri, A. (2004). Analyzing a randomised

trial on breast self-examination with noncompliance and missing outcomes. Bio-

statistics, 5(2), 207222.

Meng, X.-L. & Rubin, D. B. (1991). Using EM to obtain asymptotic variance-

covariance matrices: the SEM algorithm. Journal of the American Statistical

Association, 86(December), 899909.

Meng, X.-L. & Rubin, D. B. (1993). Maximum likelihood estimation via the ECM

algorithm: a general framework. Biometrika, 80(2), 267278.

O'Malley, A. J. & Normand, S. T. (2005). Likelihood methods for treatment

noncompliance and subsequent nonresponse in randomised trials. Biometrics,

61(June), 325334.

Pearl, J. (2001). Causal inference in health sciences: a conceptual introduction.

Health Services and Outcomes Methodology, 2, 189220.

Peng, Y., Little, R. J. A., & Raghunathan, T. E. (2004). An extended general

location model for causal inferences from data subject to noncompliance and

missing values. Biometrics, 60(September), 598670.

Robins, J. M. & Tsiatis, A. A. (1991). Correcting for noncompliance in randomised

trials using rank preserving structural failure time models. Communications in

Statistics-Theory and Methods, 20(8), 26092631.

116

Rubin, D. (1978). Bayesian inference for causal effects: the role of randomisation.

The Annals of Statistics, 6(1), 3458.

Rubin, D. (1986). Which ifs have causal answers?

Journal of the American

Statistical Association, 81, 961962.

Rubin, D. (2005). Causal inference using potential outcomes: Design, modeling,

decisions. Journal of the American Statistical Association, 100(469), 322331.

Shapiro, S. (1977). Evidence on screening for breast cancer from a randomized

trial. Cancer, 39(June Supplement), 27722782.

Shapiro, S., Venet, W., Strax, P., & Venet, L. (1988). Periodic Screening for Breast

Cancer: The Health Insurance Plan Project and Its Sequelae, 1963-1986. The

Johns Hopkins Series in Contemporary Medicine and Public Health. Baltimore:

The Johns Hopkins University Press.

Shapiro, S., Venet, W., Strax, P., Venet, L., & Roeser, R. (1982). Ten to fourteen

year effect of screening on breast cancer mortality. Journal of the National

Cancer Institute, 69(No.2), 349355.

Tanner, M. A. (1993). Tools for statistical inference : methods for the exploration

of posterior distributions and likelihood functions. Springer series in statistics.

New York: Springer-Verlag, 2nd edition.

White, I. (2005). Uses and limitations of randomisation-based efficacy estimators.

Statistical Methods in Medical Research, 14, 327347.

Yau, L. H. Y. & Little, R. J. (2001). Inference for the complier-average causal

effect from longitudinal data subject to noncompliance and missing data, with

application to a job training assessment for the unemployed. Journal of the

American Statistical Association, 96(456), 12321244.

117

Appendix B

Tables

Table B-1: Estimated model parameter ^

in the NIGN-WW model for the HIP

data.

Method:

NIGN-WW

Data:

HIP

Estimates

LCL

UCL

T,c1

0.0057

0.0005

0.0047

0.0067

T,c0

0.0064

0.0006

0.0052

0.0076

T,n

0.0064

0.0006

0.0053

0.0075

1.8254

0.0613

1.7053

1.9456

C,c1

0.0622

0.0001

0.0620

0.0625

C,c0

0.0639

0.0002

0.0636

0.0642

C,n

0.0663

0.0003

0.0657

0.0669

3.1928

0.0006

3.1916

3.1940

HIP

: HIP data with follow-up of 18 years.

Table B-2: Estimated model parameter ^

in the NIGN-LL model for the HIP

data.

Method:

NIGN-LL

Data:

HIP

Estimates

LCL

UCL

T,c1

6.2597

0.1157

6.0331

6.4864

T,c0

6.1094

0.1134

5.8871

6.3317

T,n

6.1481

0.1233

5.9065

6.3898

1.5595

0.0479

1.4657

1.6533

C,c1

2.6173

0.0040

2.6094

2.6252

C,c0

2.6217

0.0041

2.6136

2.6298

C,n

2.3269

0.0055

2.3162

2.3376

0.5674

0.0000

0.5673

0.5674

HIP

: HIP data with follow-up of 18 years.

118

Table B-3: Estimated model parameter ^

in the NIGN-WW model for the HIP

data.

Method:

NIGN-WW

Data:

HIP

Estimates

LCL

UCL

T,c1

0.0059

0.0014

0.0031

0.0087

T,c0

0.0092

0.0020

0.0053

0.0131

T,n

0.0069

0.0016

0.0037

0.0101

1.9362

0.1389

1.6640

2.2084

C,c1

0.1438

0.0001

0.1436

0.1440

C,c0

0.1446

0.0001

0.1444

0.1448

C,n

0.1452

0.0001

0.1450

0.1454

10.3780

0.0004

10.3770

10.3790

HIP

: HIP data with follow-up of 7 years.

Table B-4: Estimated model parameter ^

in the NIGN-LL model for the HIP

data.

Method:

NIGN-LL

Data:

HIP

Estimates

LCL

UCL

T,c1

6.7587

0.3394

6.0935

7.4239

T,c0

6.2969

0.3073

5.6947

6.8992

T,n

6.6068

0.3393

5.9419

7.2718

1.6828

0.1134

1.4605

1.9052

C,c1

1.8900

0.0020

1.8861

1.8939

C,c0

1.8808

0.0017

1.8775

1.8840

C,n

1.7764

0.0034

1.7697

1.7831

0.2831

0.0000

0.2830

0.2832

HIP

: HIP data with follow-up of 7 years.

119

able

B-5:

Iterations

the

algorithm

using

the

NIGN-WW

del

for

the

HIP

data.

del:

NIGN-WW

Data:

HIP

Iter

(1)

Mflag

174.74

156.41

155.26

1.83

16.07

15.64

15.09

3.19

113.9300

174.75

156.20

155.52

1.83

16.07

15.64

15.09

3.19

0.2520

174.77

156.06

155.70

1.83

16.07

15.65

15.08

3.19

0.1870

174.77

155.96

155.82

1.83

16.07

15.65

15.08

3.19

0.1172

174.78

155.89

155.91

1.83

16.07

15.65

15.08

3.19

0.0887

174.78

155.85

155.96

1.83

16.07

15.65

15.08

3.19

0.0569

174.78

155.81

156.00

1.83

16.07

15.65

15.08

3.19

0.0390

174.78

155.79

156.03

1.83

16.07

15.65

15.08

3.19

0.0272

174.79

155.78

156.05

1.83

16.07

15.65

15.08

3.19

0.0173

174.78

155.77

156.06

1.83

16.07

15.65

15.08

3.19

0.0130

174.78

155.76

156.07

1.83

16.07

15.65

15.08

3.19

0.0094

174.78

155.76

156.07

1.83

16.07

15.65

15.08

3.19

0.0055

174.78

155.75

156.07

1.83

16.07

15.65

15.08

3.19

0.0052

174.78

155.75

156.08

1.83

16.07

15.65

15.08

3.19

0.0044

HIP

data

with

follo

w-up

ears.

Mfl

indicates

the

maxim

)

reac

hed

the

iteration.

represen

the

maxim

comp

onen

+1)

)

the

absolute

ector

difference.

120

able

B-6:

Iterations

algorithm

for

the

NIGN-LL

del

the

HIP

data

del:

NIGN-LL

Data:

HIP

Iter

(1)

Mflag

6.2597

6.1094

6.1481

1.5595

2.6173

2.6217

2.3269

0.5674

0.0008

HIP

the

HIP

data

with

follo

wing-up

ears.

Mfl

indicates

the

maxim

)

reac

hed

the

iteration.

represen

the

maxim

comp

onen

+1)

)

the

absolute

ector

difference.

121

able

B-7:

Iterations

the

algorithm

for

the

NIGN-WW

del

the

HIP

data

del:

NIGN-WW

Data:

HIP

Iter

(1)

Mflag

169.16

109.66

142.30

1.94

6.95

6.91

6.89

10.38

5984.8000

169.18

109.32

143.14

1.94

6.95

6.91

6.89

10.38

0.8389

169.19

109.10

143.67

1.94

6.95

6.91

6.89

10.38

0.5288

169.20

108.96

144.00

1.94

6.95

6.91

6.89

10.38

0.3367

169.21

108.88

144.23

1.94

6.95

6.91

6.89

10.38

0.2229

169.22

108.83

144.36

1.94

6.95

6.91

6.89

10.38

0.1329

169.22

108.80

144.45

1.94

6.95

6.91

6.89

10.38

0.0887

169.21

108.77

144.49

1.94

6.95

6.91

6.89

10.38

0.0405

169.22

108.76

144.53

1.94

6.95

6.91

6.89

10.38

0.0388

169.21

108.75

144.54

1.94

6.95

6.91

6.89

10.38

0.0166

169.22

108.75

144.56

1.94

6.95

6.91

6.89

10.38

0.0190

169.23

108.75

144.58

1.94

6.95

6.91

6.89

10.38

0.0162

169.22

108.74

144.58

1.94

6.95

6.91

6.89

10.38

0.0102

169.22

108.74

144.58

1.94

6.95

6.91

6.89

10.38

0.0045

HIP

data

with

follo

w-up

ears.

Mfl

indicates

the

maxim

)

reac

hed

the

iteration.

represen

the

maxim

comp

onen

+1)

)

the

absolute

ector

difference.

122

able

B-8:

Iterations

the

algorithm

for

the

NIGN-LL

del

the

HIP

data

del:

NIGN-LL

Data:

HIP

Iter

(1)

Mflag

8.22

7.56

8.02

2.18

1.89

1.88

1.80

0.28

1.9602

6.76

6.29

6.62

1.68

1.89

1.87

1.78

0.28

1.4641

6.76

6.29

6.61

1.68

1.89

1.88

1.78

0.28

0.0068

6.76

6.30

6.61

1.68

1.89

1.88

1.78

0.28

0.0041

HIP

data

with

follo

w-up

ears.

Mfl

indicates

the

maxim

)

reac

hed

the

iteration.

represen

the

maxim

comp

onen

+1)

)

the

absolute

ector

difference.

123

Table B-9: Estimated ACE(t)(95% CI) using the F-R method for the HIP

data.

Method:

F-R

Data:

HIP

Year

ACE

F R

ace

LCL

ace

U CL

ace

Zvalue

- value

0.38E

-04 0.93E-04 -1.44E-04 2.20E-04 0.4126

0.3399

2.03E

-04 1.63E-04 -1.16E-04 5.23E-04 1.2464

0.1063

3.58E

-04 2.22E-04 -0.76E-04 7.93E-04 1.6160

0.0530

7.38E

-04 2.97E-04

1.56E

-04

1.32E

-03 2.4843

0.0065

12.1E

-04 3.92E-04

4.41E

-04

1.98E

-03 3.0855

0.0010

18.6E

-04 4.77E-04

9.23E

-04

2.79E

-03 3.8932

0.0000

17.1E

-04 5.30E-04

6.74E

-04

2.75E

-03 3.2317

0.0006

14.1E

-04 5.81E-04

2.71E

-04

2.55E

-03 2.4268

0.0076

17.3E

-04 6.49E-04

4.55E

-04

3.00E

-03 2.6614

0.0039

14.6E

-04 7.01E-04

0.82E

-04

2.83E

-03 2.0769

0.0189

17.3E

-04 7.58E-04

2.46E

-04

3.22E

-03 2.2844

0.0112

14.9E

-04 8.15E-04 -1.10E-04 3.08E-03 1.8249

0.0340

14.8E

-04 8.68E-04 -2.19E-04 3.18E-03 1.7081

0.0438

10.5E

-04 9.21E-04 -7.60E-04 2.85E-03 1.1350

0.1282

11.1E

-04 10.1E-04 -8.72E-04 3.09E-03 1.0963

0.1365

10.5E

-04 10.9E-04 -10.8E-04 3.19E-03 0.9683

0.1665

11.8E

-04 11.4E-04 -10.6E-04 3.42E-03 1.0354

0.1502

21.3E

-04 12.2E-04 -2.66E-04 4.52E-03 1.7426

0.0407

HIP

: HIP data with follow-up of 18 years

124

Table B-10: Estimated ACE(t) (95% CI) using the NIGN-WW model for the

HIP

data.

Method:

NIGN-WW

Data:

HIP

Year

ACE

nign

-ww

ace

LCL

ace

U CL

ace

Zvalue

- value

0.13E

-04

0.07E

-04 -0.01E-04 0.26E-04 1.8643

0.0311

0.45E

-04

0.24E

-04 -0.02E-04 0.91E-04 1.8920

0.0292

0.94E

-04

0.49E

-04 -0.03E-04 1.91E-04 1.9030

0.0285

1.59E

-04

0.83E

-04 -0.04E-04 3.22E-04 1.9084

0.0282

2.38E

-04

1.25E

-04 -0.06E-04 4.83E-04 1.9111

0.0280

3.32E

-04

1.74E

-04 -0.08E-04 6.73E-04 1.9123

0.0279

4.40E

-04

2.30E

-04 -0.11E-04 8.91E-04 1.9127

0.0279

5.61E

-04

2.93E

-04 -0.14E-04 11.4E-04 1.9126

0.0279

6.95E

-04

3.63E

-04 -0.17E-04 14.1E-04 1.9121

0.0279

8.41E

-04

4.40E

-04 -0.21E-04 17.0E-04 1.9113

0.0280

10.0E

-04

5.23E

-04 -0.26E-04 20.3E-04 1.9104

0.0280

11.7E

-04

6.13E

-04 -0.31E-04 23.7E-04 1.9094

0.0281

13.5E

-04

7.09E

-04 -0.376E-04 27.4E-04 1.9083

0.0282

15.5E

-04

8.11E

-04 -0.43E-04 31.4E-04 1.9072

0.0282

17.5E

-04

9.19E

-04 -0.50E-04 35.5E-04 1.9061

0.0283

19.7E

-04

10.3E

-04 -0.57E-04 39.9E-04 1.9049

0.0284

21.9E

-04

11.5E

-04 -0.65E-04 44.5E-04 1.9037

0.0285

24.3E

-04

12.8E

-04 -0.73E-04 49.4E-04 1.9026

0.0285

HIP

: HIP data with follow-up of 18 years

125

Table B-11: Estimated ACE(t) (95% CI) using the NIGN-LL model for the HIP

data.

Method:

NIGN-LL

Data:

HIP

Year

ACE

nign

-ll

ace

LCL

ace

U CL

ace

Zvalue

- value

0.10E

-04

0.05E

-04 0.01E-04 0.19E-04 2.1060

0.0176

0.53E

-04

0.23E

-04 0.07E-04 0.98E-04 2.2561

0.0120

1.27E

-04

0.55E

-04 0.20E-04 2.34E-04 2.3214

0.0101

2.27E

-04

0.96E

-04 0.38E-04 4.16E-04 2.3576

0.0092

3.49E

-04

1.47E

-04 0.62E-04 6.37E-04 2.3804

0.0086

4.89E

-04

2.04E

-04 0.89E-04 8.89E-04 2.3958

0.0083

6.42E

-04

2.67E

-04 1.19E-04 11.7E-04 2.4068

0.0080

8.08E

-04

3.34E

-04 1.52E-04 14.6E-04 2.4150

0.0079

9.82E

-04

4.06E

-04 1.87E-04 17.8E-04 2.4213

0.0077

11.6E

-04

4.80E

-04 2.24E-04 21.1E-04 2.4262

0.0076

13.5E

-04

5.57E

-04 2.62E-04 24.5E-04 2.4302

0.0075

15.5E

-04

6.36E

-04 3.01E-04 27.9E-04 2.4334

0.0075

17.5E

-04

7.16E

-04 3.41E-04 31.5E-04 2.4361

0.0074

19.5E

-04

7.98E

-04 3.82E-04 35.1E-04 2.4384

0.0074

21.5E

-04

8.81E

-04 4.23E-04 38.8E-04 2.4404

0.0073

23.6E

-04

9.65E

-04 4.65E-04 42.5E-04 2.4421

0.0073

25.6E

-04

10.5E

-04 5.07E-04 46.2E-04 2.4435

0.0073

27.7E

-04

11.3E

-04 5.49E-04 49.9E-04 2.4448

0.0072

HIP

: HIP data with follow-up of 18 years.

126

Table B-12: Estimated CACE(t) (95% CI) using the F-R method for the HIP

data.

Method:

F-R

Data:

HIP

Year

CACE

F R

cace

LCL

cace

U CL

cace

Zvalue

- value

0.57E

-04 1.39E-04 -2.14E-04 3.29E-04 0.4127

0.3399

3.03E

-04 2.43E-04 -1.74E-04 7.80E-04 1.2464

0.1063

5.35E

-04 3.31E-04 -1.14E-04 11.8E-04 1.6159

0.0531

11.0E

-04 4.44E-04

2.32E

-04

19.7E

-04 2.4836

0.0065

18.1E

-04 5.85E-04

6.58E

-04

29.5E

-04 3.0846

0.0010

27.7E

-04 7.13E-04

1.38E

-03

41.7E

-04 3.8930

0.0000

25.6E

-04 7.91E-04

1.01E

-03

41.1E

-04 3.2321

0.0006

21.1E

-04 8.68E-04

4.05E

-04

38.1E

-04 2.4272

0.0076

25.8E

-04 9.69E-04

6.80E

-04

44.8E

-04 2.6616

0.0039

21.7E

-04 10.5E-04

1.23E

-04

42.3E

-04 2.0772

0.0189

25.9E

-04 11.3E-04

3.68E

-04

48.0E

-04 2.2849

0.0112

22.2E

-04 12.2E-04 -1.64E-04 46.1E-04 1.8252

0.0340

22.1E

-04 13.0E-04 -3.26E-04 47.5E-04 1.7082

0.0438

15.6E

-04 13.8E-04 -11.3E-04 42.6E-04 1.1349

0.1282

16.5E

-04 15.1E-04 -13.0E-04 46.1E-04 1.0962

0.1365

15.8E

-04 16.3E-04 -16.1E-04 47.6E-04 0.9682

0.1665

17.7E

-04 17.1E-04 -15.8E-04 51.1E-04 1.0353

0.1502

31.8E

-04 18.2E-04 -3.96E-04 67.5E-04 1.7427

0.0407

HIP

: HIP data with follow-up 18 years

127

Table B-13: Estimated CACE(t) (95% CI) using the NIGN-WW model for the

HIP

data.

Method:

NIGN-WW

Data:

HIP

Year

CACE

nign

-ww

cace

LCL

cace

U CL

cace

Zvalue

- value

0.19E

-04

0.10E

-04 -0.01E-04 0.39E-04 1.8643

0.0311

0.67E

-04

0.35E

-04 -0.02E-04 1.36E-04 1.8921

0.0292

1.40E

-04

0.74E

-04 -0.04E-04 2.85E-04 1.9031

0.0285

2.37E

-04

1.24E

-04 -0.06E-04 4.80E-04 1.9084

0.0282

3.56E

-04

1.86E

-04 -0.09E-04 7.21E-04 1.9111

0.0280

4.96E

-04

2.59E

-04 -0.12E-04 10.0E-04 1.9124

0.0279

6.57E

-04

3.43E

-04 -0.16E-04 13.3E-04 1.9128

0.0279

8.37E

-04

4.38E

-04 -0.21E-04 17.0E-04 1.9126

0.0279

10.4E

-04

5.42E

-04 -0.26E-04 21.0E-04 1.9121

0.0279

12.6E

-04

6.57E

-04 -0.32E-04 25.4E-04 1.9114

0.0280

14.9E

-04

7.81E

-04 -0.39E-04 30.2E-04 1.9105

0.0280

17.5E

-04

9.15E

-04 -0.46E-04 35.4E-04 1.9095

0.0281

20.2E

-04

10.6E

-04 -0.55E-04 40.9E-04 1.9084

0.0282

23.1E

-04

12.1E

-04 -0.64E-04 46.8E-04 1.9073

0.0282

26.2E

-04

13.7E

-04 -0.74E-04 53.0E-04 1.9061

0.0283

29.4E

-04

15.4E

-04 -0.85E-04 59.6E-04 1.9049

0.0284

32.8E

-04

17.2E

-04 -0.97E-04 66.5E-04 1.9038

0.0285

36.3E

-04

19.1E

-04 -1.09E-04 73.7E-04 1.9026

0.0285

HIP

: HIP data with follow-up of 18 years.

128

Table B-14: Estimated CACE(t) (95% CI) using the NIGN-LL models for the

HIP

data.

Method:

NIGN-LL

Data:

HIP

Year

CACE

nign

-ll

cace

LCL

cace

U CL

cace

Zvalue

- value

0.15E

-04

0.07E

-04 0.01E-04 0.29E-04 2.1061

0.0176

0.78E

-04

0.35E

-04 0.10E-04 1.47E-04 2.2562

0.0120

1.89E

-04

0.82E

-04 0.30E-04 3.49E-04 2.3215

0.0101

3.39E

-04

1.44E

-04 0.57E-04 6.21E-04 2.3577

0.0092

5.22E

-04

2.19E

-04 0.92E-04 9.51E-04 2.3805

0.0086

7.30E

-04

3.05E

-04 1.33E-04 13.3E-04 2.3959

0.0083

9.59E

-04

3.98E

-04 1.78E-04 17.4E-04 2.4069

0.0080

12.1E

-04

4.99E

-04 2.27E-04 21.8E-04 2.4151

0.0079

14.7E

-04

6.06E

-04 2.79E-04 26.5E-04 2.4214

0.0077

17.4E

-04

7.17E

-04 3.34E-04 31.4E-04 2.4263

0.0076

20.2E

-04

8.31E

-04 3.91E-04 36.5E-04 2.4303

0.0075

23.1E

-04

9.49E

-04 4.50E-04 41.7E-04 2.4336

0.0075

26.1E

-04

10.7E

-04 5.09E-04 47.0E-04 2.4363

0.0074

29.1E

-04

11.9E

-04 5.70E-04 52.4E-04 2.4386

0.0074

32.1E

-04

13.2E

-04 6.32E-04 57.9E-04 2.4405

0.0073

35.2E

-04

14.4E

-04 6.94E-04 63.4E-04 2.4422

0.0073

38.3E

-04

15.7E

-04 7.57E-04 68.9E-04 2.4437

0.0073

41.4E

-04

16.9E

-04 8.20E-04 74.5E-04 2.4449

0.0072

HIP

: HIP data with follow-up of 18 years.

129

Table B-15: Estimated ACE(t) (95% CI) using the F-R method for the HIP

data.

Method:

F-R

Data:

HIP

Year

ACE

F R

ace

LCL

ace

U CL

ace

Zvalue

- value

0.38E

-04 0.93E-04 -1.44E-04 2.20E-04 0.4126

0.3399

2.03E

-04 1.63E-04 -1.16E-04 5.23E-04 1.2464

0.1063

3.58E

-04 2.22E-04 -0.76E-04 7.93E-04 1.6160

0.0530

7.38E

-04 2.97E-04

1.56E

-04

13.2E

-04 2.4843

0.0065

12.1E

-04 3.92E-04

4.41E

-04

19.8E

-04 3.0855

0.0010

18.7E

-04 4.74E-04

9.44E

-04

28.0E

-04 3.9507

0.0000

18.7E

-04 4.74E-04

9.44E

-04

28.0E

-04 3.9507

0.0000

HIP

: HIP data with follow-up of 7 years

Table B-16: Estimated CACE(t) (95% CI) using the F-R method for the HIP

data.

Method:

F-R

Data:

HIP

Year

CACE

F R

cace

LCL

cace

U CL

cace

Zvalue

- value

0.57E

-04 1.39E-04 -2.14E-04 3.29E-04 0.4127

0.3399

3.03E

-04 2.43E-04 -1.74E-04 7.80E-04 1.2464

0.1063

5.35E

-04 3.31E-04 -1.14E-04 11.8E-04 1.6159

0.0531

11.0E

-04 4.44E-04

2.32E

-04

19.7E

-04 2.4836

0.0065

18.1E

-04 5.85E-04

6.58E

-04

29.5E

-04 3.0846

0.0010

28.0E

-04 7.08E-04

14.1E

-04

41.9E

-04 3.9505

0.0000

28.0E

-04 7.08E-04

14.1E

-04

41.9E

-04 3.9505

0.0000

HIP

: HIP data with follow-up of 7 years

130

Table B-17: Estimated ACE(t) (95% CI) using the NIGN-WW model for the

HIP

data.

Method:

NIGN-WW

Data:

HIP

Year

ACE

nign

-ww

ace

LCL

ace

U CL

ace

Zvalue

- value

0.44E

-05

0.16E

-04 0.13E-04 0.75E-05 2.7777

0.0027

1.68E

-04

0.51E

-04 0.69E-04 2.67E-04 3.3248

0.0004

3.69E

-04

1.02E

-04 1.69E-04 5.68E-04 3.6237

0.0001

6.43E

-04

1.70E

-04 3.10E-04 9.76E-04 3.7839

0.0001

9.90E

-04

2.56E

-04 4.87E-04 14.9E-04 3.8593

0.0001

14.1E

-04

3.63E

-04 6.97E-04 21.2E-04 3.8822

0.0001

19.0E

-04

4.89E

-04 9.36E-04 28.5E-04 3.8729

0.0001

HIP

: HIP data with follow-up of 7 years.

Table B-18: Estimated ACE(t) (95% CI) using the NIGN-WW model for the

HIP

data.

Method:

NIGN-LL

Data:

HIP

Year

ACE

nign

-ll

ace

LCL

ace

U CL

ace

Zvalue

- value

0.41E

-04

0.16E

-04 0.09E-04 0.74E-04 2.5238

0.0058

1.86E

-04

0.58E

-04 0.73E-04 2.99E-04 3.2280

0.0006

4.15E

-04

1.15E

-04 1.89E-04 6.41E-04 3.5969

0.0002

7.07E

-04

1.86E

-04 3.42E-04 10.7E-04 3.7917

0.0001

10.5E

-04

2.69E

-04 5.20E-04 15.8E-04 3.8924

0.0000

14.3E

-04

3.62E

-04 7.18E-04 21.4E-04 3.9413

0.0000

18.4E

-04

4.64E

-04 9.28E-04 27.5E-04 3.9612

0.0000

HIP

: HIP data with follow-up of 7 years

131

Table B-19: Estimated CACE(t) (95% CI) using the NIGN-WW model for the

HIP

data.

Method:

NIGN-WW

Data:

HIP

Year

CACE

nign

-ww

cace

LCL

cace

U CL

cace

Zvalue

- value

0.66E

-04

0.24E

-04 0.19E-04 1.12E-04 2.7779

0.0027

2.51E

-04

0.76E

-04 1.03E-04 3.99E-04 3.3251

0.0004

5.50E

-04

1.52E

-04 2.53E-04 8.48E-04 3.6241

0.0001

9.60E

-04

2.54E

-04 4.63E-04 14.6E-04 3.7843

0.0001

14.8E

-04

3.83E

-04 7.27E-04 22.3E-04 3.8597

0.0001

21.0E

-04

5.41E

-04 10.4E-04 31.6E-04 3.8827

0.0001

28.3E

-04

7.31E

-04 14.0E-04 42.6E-04 3.8734

0.0001

HIP

: HIP data with follow-up of 7 years

Table B-20: Estimated CACE(t) (95% CI) using the the NIGN-LL model for the

HIP

data.

Method:

NIGN-LL

Data:

HIP

Year

CACE

nign

-ll

cace

LCL

cace

U CL

cace

Zvalue

- value

0.62E

-04

0.25E

-04 0.14E-04 1.10E-04 2.5240

0.0058

2.78E

-04

0.86E

-04 1.09E-04 4.47E-04 3.2283

0.0006

6.19E

-04

1.72E

-04 2.82E-04 9.57E-04 3.5972

0.0002

10.6E

-04

2.78E

-04 5.10E-04 16.0E-04 3.7921

0.0001

15.7E

-04

4.02E

-04 7.77E-04 23.5E-04 3.8929

0.0000

21.3E

-04

5.41E

-04 10.7E-04 31.9E-04 3.9418

0.0000

27.4E

-04

6.92E

-04 13.9E-04 41.0E-04 3.9617

0.0000

HIP

: HIP data with follow-up of 7 years

132

Table B-21: Estimated survival values using the non-ignorable PPO survival mod-

els for the HIP

data.

Data:

HIP

Model:

NIGN-WW

NIGN-LL

Years

(t)

0.9999

1.0000

0.9997

0.9998

0.9997

0.9998

0.9994

0.9993

0.9995

0.9993

0.9994

0.9990

0.9988

0.9991

0.9988

0.9989

0.9985

0.9981

0.9986

0.9980

0.9982

0.9979

0.9974

0.9979

0.9972

0.9974

0.9972

0.9965

0.9966

0.9972

0.9962

0.9965

0.9964

0.9956

0.9963

0.9951

0.9955

0.9956

0.9945

0.9954

0.9939

0.9943

0.9946

0.9934

0.9944

0.9927

0.9932

0.9936

0.9921

0.9934

0.9913

0.9919

0.9925

0.9908

0.9922

0.9899

0.9906

0.9913

0.9893

0.9911

0.9885

0.9892

0.9901

0.9878

0.9899

0.9870

0.9878

0.9888

0.9861

0.9862

0.9886

0.9854

0.9863

0.9874

0.9844

0.9845

0.9873

0.9838

0.9848

0.9859

0.9826

0.9827

0.9860

0.9822

0.9832

0.9843

0.9807

0.9808

0.9846

0.9805

0.9817

Table B-22: Estimated survival values using the non-ignorable PPO survival mod-

els for the HIP

data.

Data:

HIP

Model:

NIGN-WW

NIGN-LL

Years

(t)

1.0000

0.9999

1.0000

0.9999

1.0000

0.9998

0.9996

0.9998

0.9996

0.9998

0.9996

0.9990

0.9994

0.9996

0.9990

0.9995

0.9993

0.9983

0.9990

0.9993

0.9982

0.9990

0.9989

0.9974

0.9985

0.9989

0.9973

0.9985

0.9984

0.9963

0.9979

0.9984

0.9963

0.9979

0.9951

0.9972

0.9979

0.9951

0.9972

133

Table B-23: Estimated survival values using the F-R method for the HIP

data.

Data:

HIP

Model:

F-R

Years

(t)

0.9999

0.9998

0.9997

0.9993

0.9996

0.9994

0.9989

0.9992

0.9991

0.9980

0.9985

0.9967

0.9977

0.9979

0.9951

0.9969

0.9971

0.9945

0.9959

0.9960

0.9939

0.9949

0.9950

0.9924

0.9941

0.9940

0.9918

0.9930

0.9932

0.9907

0.9916

0.9922

0.9900

0.9896

0.9910

0.9888

0.9879

0.9898

0.9883

0.9860

0.9887

0.9870

0.9841

0.9872

0.9857

0.9825

0.9857

0.9840

0.9816

0.9845

0.9813

0.9809

Table B-24: Estimated survival values using the F-R method for the HIP

data.

Data:

HIP

Model:

F-R

Years

(t)

0.9999

0.9998

0.9997

0.9993

0.9996

0.9994

0.9989

0.9992

0.9991

0.9980

0.9985

0.9967

0.9977

0.9979

0.9951

0.9972

0.9979

0.9951

0.9972

134

Table B-25: The ITT estimated survival values using the non-ignorable PPO

survival models for the HIP

data.

Data:

HIP

Model:

NIGN-WW

NIGN-LL

Years

(t)

0.9999

1.0000

0.9997

0.9998

0.9994

0.9993

0.9995

0.9994

0.9989

0.9988

0.9990

0.9988

0.9984

0.9981

0.9984

0.9981

0.9977

0.9974

0.9977

0.9972

0.9970

0.9965

0.9969

0.9963

0.9961

0.9956

0.9960

0.9952

0.9945

0.9951

0.9941

0.9942

0.9934

0.9940

0.9928

0.9931

0.9921

0.9929

0.9915

0.9919

0.9908

0.9917

0.9902

0.9907

0.9893

0.9905

0.9887

0.9893

0.9878

0.9892

0.9872

0.9879

0.9862

0.9879

0.9857

0.9864

0.9844

0.9865

0.9841

0.9848

0.9826

0.9851

0.9825

0.9832

0.9808

0.9837

0.9809

Table B-26: The ITT estimated survival values using the non-ignorable PPO

survival models for the HIP

data.

Data:

HIP

Model:

NIGN-WW

NIGN-LL

Years

(t)

1.0000

0.9999

1.0000

0.9999

0.9998

0.9996

0.9998

0.9996

0.9992

0.9996

0.9991

0.9992

0.9986

0.9992

0.9985

0.9988

0.9978

0.9988

0.9977

0.9983

0.9969

0.9982

0.9968

0.9977

0.9958

0.9977

0.9958

135

Table B-27: The ITT estimated survival values using the F-R method for the

HIP

data.

Data:

HIP

Model: F-R

Years

(t)

0.9999

0.9998

0.9996

0.9994

0.9990

0.9989

0.9982

0.9983

0.9970

0.9976

0.9957

0.9967

0.9950

0.9956

0.9942

0.9947

0.9930

0.9937

0.9922

0.9927

0.9910

0.9914

0.9899

0.9900

0.9885

0.9886

0.9875

0.9872

0.9861

0.9857

0.9846

0.9843

0.9832

0.9833

0.9812

Table B-28: The ITT estimated survival values using the F-R method for the

HIP

data.

Data:

HIP

Model: F-R

Years

(t)

0.9999

0.9998

0.9996

0.9994

0.9990

0.9989

0.9982

0.9983

0.9971

0.9977

0.9958

0.9977

0.9958

136

Appendix C

Calculations of standard error (SE) of

CACE(t) and ACE(t)

As noted in Section 3, estimating CACE(t) and ACE(t) rather than esti-

mating the model parameters

T,uz

may be significantly more valuable and more

interpretable. The standard errors corresponding to the estimated CACE(t) and

ACE(t) have not been derived to date, to the best of the author's knowledge.

We now consider the derivation of formulae for calculating the standard errors

of the estimated CACE(t) and ACE(t) via the estimated model parameters

T,uz

Recall that CACE(t) was defined as the difference in survival values between

compliers in the treated group and the control group

By following

the notation of the customary survival function (for subgroups), CACE(t) can

thus be re-written as follows:

CACE(t) = S

(t;

T,c1

)

- S

(t;

T,c0

(C-1)

According to the property of variance of linear combinations (Theorem 4.5.6,

p.171, Casella & Berger (2002)), the variance of CACE(t) can be easily expressed

as:

V ar(CACE(t)) = V ar(S

(t;

T,c1

)) + V ar(S

(t;

T,c0

))

- 2Cov(S

(t;

T,c1

), S

(t;

T,c0

)),

(C-2)

where S

(t;

T,cz

) (for z = 1 or 0) denotes the survival function at time t associ-

ated with the subgroup of individuals who have compliance type U

= c and who

137

have been assigned into the group of Z

= z (z = 1 for the treated group and

z = 0 for the control group).

Note that except for the exponential distribution, most common parametric

models for survival outcomes, such as the Weibull, Gamma, log-normal, and the

log-logistic models have two parameters (Klein, 1997; Ibrahim et al., 2001). Let

T,uz

= (

(1)

T,uz

(2)

T,uz

), and we treat V ar(S

(t;

T,uz

)) as the variance of a function

of two variables (

(1)

T,uz

(2)

T,uz

). Then the Cov(S

(t;

T,c1

), S

(t;

T,c0

)) has to be

viewed as the covariance function of two variables (

(1)

T,uz

(2)

T,uz

). As such, we deal

with the variance and covariance of functions of multiple random variables. The

method (Klein, 1997) (see below) can then be used to derive formulae for such

approximations.

The method

The method, also known as the method of statistical

differential (Klein, 1997), is based on a Taylor series expansion of a continuous

function of the MLEs of a vector of parameters (

(1)

T,uz

(2)

T,uz

) for u = c and v = 1

or 0. For convenience, let f (x, y) denote a function of two variables, e.g. x and y;

and suppose that at the point (x

, y

), the first partial derivative of f (x, y) with

respect to x and y exist. These are denoted by f

, y

) and f

, y

). Then the

first order Taylor series expansion at the point (x

, y

) has the form as follows:

f (x, y) = f (x

, y

) + f

, y

)(x

- x

) + f

, y

)(y

- y

(C-3)

If x and y are both random variables with mean values, ¯

x and ¯

y, then setting

= ¯

x and y

= ¯

y, allows equation (C-3) to be expressed as:

f (x, y) = f (¯

x, ¯

y) + f

(¯

x, ¯

y)(x

- ¯x) + f

(¯

x, ¯

y)(y

- ¯y).

(C-4)

138

Variance of the function of two variables

From equation (C-4), it is

straightforward to observe that the function of two random variables, f (x, y),

has the mean value f (¯

x, ¯

y), as the last two terms in equation (C-4) are zeros.

By following the properties of variance (Theorem 4.5.6, p.171 Casella & Berger

(2002)), we have:

V ar(f (x, y)) = f

(¯

x, ¯

V ar(x) + f

(¯

x, ¯

V ar(y)

+ 2f

(¯

x, ¯

y)f

(¯

x, ¯

y)Cov(x, y),

(C-5)

because variances for constants are zeros, we have V ar(f (¯

x, ¯

y)) = 0, V ar(¯

x) = 0

and V ar(¯

y) = 0.

By following equation (C-5), we can write:

V ar(S

(t;

T,uz

)) = V ar(S

(t;

(1)

T,uz

(2)

T,uz

))

(t;

(1)

T,uz

(2)

T,uz

)

(1)

T,uz

(

(1)

T,uz

(2)

T,uz

)

var(

(1)

T,uz

)

(t;

(1)

T,uz

(2)

T,uz

)

(2)

T,uz

(

(1)

T,uz

(2)

T,uz

)

var(

(2)

T,uz

)

(C-6)

(t;

(1)

T,uz

(2)

T,uz

)

(1)

T,uz

(

(1)

T,uz

(2)

T,uz

)

(t;

(1)

T,uz

(2)

T,uz

)

(2)

T,uz

(

(1)

T,uz

(2)

T,uz

)

cov(

(1)

T,uz

(2)

T,uz

The approximate formula of V ar(S

(v, t;

(1)

T,uz

(2)

T,uz

)), the variance of the survival

function, can then be expressed by equation (C-6).

Covariance of functions of two variables

Similarly, if h(u, v) denotes an-

other function of two random variables u and v, with mean values ¯

u and ¯

v, then the

function h(u, v) has a mean value h(¯

u, ¯

v).

By the definition of covariance, the covariance between f (x, y) and h(u, v) is then

139

the expectation of the product of two factors (f (x, y)

- f(¯x, ¯y)) and (h(u, v) - h(¯u, ¯v));

that is:

Cov(f (x, y), h(u, v)) = E

{(f(x, y) - f(¯x, ¯y))(h(u, v) - h(¯u, ¯v))}.

(C-7)

By first order Taylor series expansion, equation (C-7) can be expressed by first order

partial derivative values at the mean value points, i.e.:

Cov(f (x, y), h(u, v))

= E

(¯

x, ¯

y)(x

- ¯x) + f

(¯

x, ¯

y)(y

- ¯y))(h

(¯

u, ¯

v)(u

- ¯u) + h

(¯

u, ¯

v)(v

- ¯v))}.

(C-8)

Furthermore, the covariance between f (x, y) and h(u, v) can be simplified to

Cov(f (x, y), h(u, v)) = f

(¯

x, ¯

y)h

(¯

u, ¯

v)Cov(x, u) + f

(¯

x, ¯

y)h

(¯

u, ¯

v)Cov(y, u)

+ f

(¯

x, ¯

y)h

(¯

u, ¯

v)Cov(x, v) + f

(¯

x, ¯

y)h

(¯

u, ¯

v)Cov(y, v).

(C-9)

By following equation (C-9), the covariance between survival functions, say S

(t;

T,c1

)

and S

(t;

T,c0

), can then be expressed as follows:

Cov(S

(t;

T,c1

), S

(t;

T,c0

)) =

(t;

(1)

T,c1

(2)

T,c1

)

(1)

T,c1

(

(1)

T,c1

(2)

T,c1

)

(t;

(1)

T,c0

(2)

T,c0

)

(1)

T,c0

(

(1)

T,c0

(2)

T,c0

)

cov(

(1)

T,c1

(1)

T,c0

)

(t;

(1)

T,c1

(2)

T,c1

)

(2)

T,c1

(

(1)

T,c1

(2)

T,c1

)

(t;

(1)

T,c0

(2)

T,c0

)

(1)

T,c0

(

(1)

T,c0

(2)

T,c0

)

cov(

(2)

T,c1

(1)

T,c0

)

(t;

(1)

T,c1

(2)

T,c1

)

(1)

T,c1

(

(1)

T,c1

(2)

T,c1

)

(t;

(1)

T,c0

(2)

T,c0

)

(2)

T,c0

(

(1)

T,c0

(2)

T,c0

)

cov(

(1)

T,c1

(2)

T,c0

)

(t;

(1)

T,c1

(2)

T,c1

)

(2)

T,c1

(

(1)

T,c1

(2)

T,c1

)

(t;

(1)

T,c0

(2)

T,c0

)

(2)

T,c0

(

(1)

T,c0

(2)

T,c0

)

cov(

(2)

T,c1

(2)

T,c0

(C-10)

140

Equations (C-6) and (C-10) both can be simplified via vector operation as follows:

V ar (S

(t;

T,cz

))

(t;

T,cz

)

(1)

T,cz

(t;

T,cz

)

(2)

T,cz

(t;

T,cz

)

(1)

T,cz

(t;

T,cz

)

(2)

T,cz

(C-11)

and

Cov (S

(t;

T,c1

), S

(t;

T,c0

))

(t;

T,c1

)

(1)

T,c1

(t;

T,c1

)

(2)

T,c1

(t;

T,c0

)

(1)

T,c0

(t;

T,c0

)

(2)

T,c0

;

(C-12)

where

and

represent the covariance matrix of all relevant parameters. More

specifically,

and

can be written as:

var(

(1)

T,cz

)

cov(

(1)

T,cz

(2)

T,cz

)

cov(

(2)

T,cz

(1)

T,cz

)

var(

(2)

T,cz

)

(C-13)

and

cov(

(1)

T,c1

(1)

T,c0

)

cov(

(1)

T,c1

(2)

T,c0

)

cov(

(2)

T,c1

(1)

T,c0

)

cov(

(2)

T,c1

(2)

T,c0

)

(C-14)

Then the standard error of CACE(t) can be calculated by taking the square root

of the variance:

CACE(t)

V ar(CACE(t)).

(C-15)

Since we have ACE(t) =

CACE(t), the variance of ACE(t) can then be calculated

141

by the following equation:

V ar(ACE(t)) = V ar(

CACE(t))

= V ar (

(t;

T,c1

)

(t;

T,c0

))

= V ar (

(t;

T,c1

)) + V ar (

(t;

T,c0

))

- 2cov(

(t;

T,c1

(t;

T,c0

)).

(C-16)

Note that as

T,c1

= (

T,c1

) and

T,c0

= (

T,c0

), the two terms in the last

line of equation (C-16), namely

(t;

T,c1

) and

(t;

T,c0

), can both be viewed

as functions of three variables, which are (

T,c1

) (in

(t;

T,c1

)) and (

T,c0

) in

(t;

T,c0

By following the logic used to derive the formulations of the variance (equation

(C-11)) and covariance (equation (C-12)) of the survival function, (S

(t;

T,uz

) (for

z = 1 and 0)), the variance of

(t;

T,uz

), denoted by V ar(

(t;

T,uz

)), can then

be expressed as follows:

V ar (

(t;

T,uz

))

(t;

T,cz

(t;

T,cz

)

(1)

T,cz

(t;

T,cz

)

(2)

T,cz

(t;

T,cz

(t|

T,cz

)

(1)

T,cz

(t;

T,cz

)

(2)

T,cz

(C-17)

where

is given by:

var(

(1)

T,cz

)

cov(

(1)

T,cz

(2)

T,cz

)

cov(

(1)

T,cz

)

cov(

(2)

T,cz

(1)

T,cz

)

var(

(2)

T,cz

)

cov(

(2)

T,cz

)

cov(

(1)

T,cz

)

cov(

(2)

T,cz

)

var(

)

(C-18)

Thus the covariance matrix of

and

T,cz

and the covariance of these products cov(

(t),

(t))

142

can be written as:

Cov (

(t;

T,c1

(t;

T,c0

))

(t;

T,c1

)

(1)

T,c1

(t;

T,c1

)

(2)

T,c1

, S

(t)

(t;

T,c0

)

(1)

T,c0

(t;

T,c0

)

(2)

T,c0

, S

(t)

(C-19)

where

, the covariance matrix of relevant parameters, is given by:

cov(

(1)

T,c1

(1)

T,c0

)

cov(

(1)

T,c1

(2)

T,c0

)

cov(

(1)

T,c1

)

cov(

(2)

T,c1

(1)

T,c0

)

cov(

(2)

T,c1

(2)

T,c0

)

cov(

(2)

T,c1

)

cov(

(1)

T,c0

)

cov(

(2)

T,c0

)

var(

)

(C-20)

Therefore, (C-16) can be re-written as:

V ar(ACE(t))

(t;

T,c1

)

(1)

T,c1

(t;

T,c1

)

(2)

T,c1

, S

(t;

T,c1

)

(t;

T,c1

)

(1)

T,c1

(t;

T,c1

)

(2)

T,c1

, S

(t;

T,c1

)

(t;

T,c0

)

(1)

T,c0

(t;

T,c0

)

(2)

T,c0

, S

(t;

T,c0

)

(t|

T,c0

)

(1)

T,c0

(t;

T,c0

)

(2)

T,c0

, S

(t;

T,c0

)

-2

(t;

T,c1

)

(1)

T,c1

(t;

T,c1

)

(2)

T,c1

, S

(t;

T,c1

)

(t;

T,c0

)

(1)

T,c0

(t;

T,c0

)

(2)

T,c0

, S

(t;

T,c0

)

(C-21)

where

and

are given by equation (C-18) with z = 1 and 0, respectively.

i.e.:

var(

(1)

T,c1

)

cov(

(1)

T,c1

(2)

T,c1

) cov(

(1)

T,c1

)

cov(

(2)

T,c1

(1)

T,c1

)

var(

(2)

T,c1

)

cov(

(2)

T,c1

)

cov(

(1)

T,c1

)

cov(

(2)

T,c1

)

var(

)

(C-22)

and

var(

(1)

T,c0

)

cov(

(1)

T,c0

(2)

T,c0

) cov(

(1)

T,c0

)

cov(

(2)

T,c1

(1)

T,c0

)

var(

(2)

T,c0

)

cov(

(2)

T,c0

)

cov(

(1)

T,c0

)

cov(

(2)

T,c0

)

var(

)

(C-23)

143

Recall that

is given in (C-20). All these covariance matrices are then available

for our PPO-survival models. Consequently, it is straightforward to obtain the

standard error of ACE(t) by taking the square root of its variance V ar(ACE(t),

that is:

ACE(t)

V ar(ACE(t)).

(C-24)

144

Appendix D

Correction of equation (3.2') in Louis (1982)

The likelihood function can be expressed as the product of each individual's

likelihood function L

() =

i=1

(). After taking the logarithm of the com-

plete data likelihood L

(), the observed information matrix (OIM), I

obs

, (given

in equation (8.17) in Section 8.4.1) can be re-written as below:

obs

= I( ^

) = I

- I

where

= E

i=1

logL

()

|u {c, a, n},

= E

i=1

logL

()

j=1

logL

()

|u {c, a, n},

(D-25)

Note that the descriptive index with respect to the summation in the second term

in equation (D-25) above are not the same, (denoted by i and j) here. By dividing

the so-called index set, 1) i = j; 2) i = j, we can re-write the term, I

= E

+ E

which is equivalent to:

i=1

logL

()

j=1

logL

()

|u {a, c, n},

= E

+ E

where

logL

()

logL

()

|u {a, c, n},

i=j

logL

()

|u {a, c, n},

logL

()

|u {a, c, n},

(D-26)

145

Note that in the first term in equation (D-26), the summation is overall obser-

vations (for i = 1 to n); whereas in the second term, it is only a summation for

i = j.

Suppose that the parameter is a vector with p dimensions. Then

logL

()

denotes a row vector with p components, for different individuals, say the i

and j

, in general,

logL

() =

logL

(). Therefore, their corresponding

expected vectors are not the same, that is:

logL

()

|u {c, n, a},

= E

logL

()

|u {c, n, a},

Consequently, it is easy to produce:

logL

()

|u {c, n, a},

logL

()

|u {c, n, a},

= E

logL

()

|u {c, n, a},

logL

()

|u {c, n, a},

(D-27)

which implies that the product of E

logL

()

|u {c, n, a},

and

logL

()

|u {c, n, a},

are not exchangeable.

Hence, we give the corrected form of equation (3.2') in Louis (1982) as follows:

obs

= I( ^

) = I

- E

where

i=1

logL

()

|u {c, n, a},

logL

()

logL

()

|u {c, n, a},

i=j

logL

()

|u {c, n, a},

logL

()

|u {c, n, a},

(D-28)

Note that equation (D-28) guarantees that the OIM, I

obs

, is a p

× p symmetric,

146

but Louis's equation (3.2') in Louis (1982) cannot yield a symmetric matrix even

though the author had thought I

obs

should be symmetric (Louis, 1982). The

distinction between these two equations, namely Louis's equation (3.2') in Louis

(1982) and its corrected form equation (D-28) given above, are that the last term

in the former, given as:

i<j

logL

()

|u {c, n, a},

logL

()

|u {c, n, a},

is not symmetric as the summation is only for i < j rather than overall observa-

tions as in equation (D-28), where the last term is a symmetric matrix. This is

because that after taking summation with respect to all observations both for i

and j but excluding i = j,

i=j

logL

()

|u {c, n, a},

logL

()

|u {c, n, a},

is a symmetric matrix, even though matrix E

[logL

()]

[logL

()]

not symmetric for i = j.

In other words, as E

[logL

()]

is not a scalar, we cannot follow the

rule of ab + ba = 2ab. This is the key error in equation (3.2') of Louis (1982).

147

Appendix E

An AFT property with its proof

In this Appendix, we propose and prove Theorem 1, a useful property of

AFT for the Weibull or log-normal distribution only. However, it is not hard to

show that by following Theorem 1,it is always true in our PPO-survival models:

each subgroup shares the shape parameter (for Weibull model) or the covariance

parameter (for log-normal model).

Theorem 1 Suppose a failure time variable T satisfies an accelerated failure time

(AFT) model T = T

exp( X), where T

represents the failure time corresponding

to the baseline; X and denote covariate vector and its associated regression

coefficients, both of them are column vectors with the same length p.

1. If T

= t

Weib(

), t

follows a Weibull distribution with scale

parameter

and shape parameter

, the new variable T = t then fol-

lows a Weibull distribution with =

exp(

- X) and =

, namely

Weib(, ).

2. If T

= t

Logn(

), t

follows a log-normal distribution with two

parameters,

and

, the mean and variance of log(T

), the new variable

T = t then follows a log-normal distribution with =

+ X and =

i.e. T = t

Logn(, ).

Proof 1 By following the definition of the survival function with respect to the

148

failure time T , the survival function can be expressed as follows:

S(t) = P r(T > t) = P r(T

exp( X) > t);

T = T

exp( X);

= P r(T

> t exp(

- X));

exp( X) > 0;

= S

(t exp(

- X)); by the definition of S

(t).

(E-29)

In addition, by the definition of a probability density function (p.d.f.), we have

the following:

f (t) =

dS(t)

;

(by definition);

)

;

for t

= t exp(

- X)

= f

) exp(

- X).

(E-30)

= t

Weib(

), we have

(t) =

(

-1)

exp(

(E-31)

(t) = exp(

(E-32)

where f

(t) and S

(t) denote the probability density (p.d.f.) and survival func-

tions (S) with respect to T

By following equations (E-29) and (E-30), the survival function and the proba-

bility density function (p.d.f.), f

(t), with respect to the failure time T in a Weibull

model, can be expressed as:

(t) = P r(T > t) = S

(t exp(

- X));

= exp(

t exp(

- X)]

(E-33)

149

and the probability density function (p.d.f.), f

(t) as:

(t) = f

(t exp(

- X)) exp(- X);

(

t exp(

- X))

(

-1)

exp(

t exp(

- X)]

) exp(

- X);

exp(

- X)(

t exp(

- X))

(

-1)

exp(

t exp(

- X)]

). (E-34)

Let =

exp(

- X) and =

. Equations (E-35) and (E-35) can then be

re-written as:

(t) = P r(T > t) = exp(

t exp(

- X)]

)

= exp(

-(t)

(E-35)

and

(t) = =

exp(

- X)(

exp(

- X)t)

(

-1)

exp(

- X)t]

);

= (t)

(-1)

exp(

-(t)

(E-36)

Because equations (E-35) or (E-36) hold, we have T = t

Weib(, ) where

exp(

- X) and =

= t

Logn(

), we can write:

) =

exp[

(

log[t

)

]

(2)

1/2

= (

log[t

]

)/t

;

(E-37)

where () represents the standard normal probability density function, which can

be written as (x) =

exp(

); and

(t) = 1

- (

log[t]

);

(E-38)

150

where () is the cumulative distribution function of a standard normal variable,

which has the general form (x) =

(u)du. By following equations (E-29) and

(E-30), the survival function and the probability density function (p.d.f.) f

(t) with

respect the failure time T in a log-normal model can be expressed as follows:

(t) = P r(T > t) = S

(t exp(

- X));

= 1

- (

log[t exp(

- X)] -

);

- (

log[t]

- X -

);

(E-39)

and

(t) = f

(t exp(

- X)) exp(- X);

exp[

(

log[t exp(- X)]-

)

]

t exp(

- X)(2)

1/2

exp(

- X);

exp[

(

log[t]- X-

)

]

t exp(

- X)(2)

1/2

exp(

- X);

exp[

(

log[t]-( X+

)

]

t(2)

1/2

(E-40)

Using =

+ X and =

, equations (E-39) and (E-40) can be re-written

as follows:

(t) = 1

- (

log[t]

(E-41)

and

(t) =

exp[

(

log[t]-

)

]

t(2)

1/2

(E-42)

Therefore, T = T

exp( X) follows a log-normal distribution with the param-

eters and . Explicitly, T = t

Logn(, ) where =

+ X and =

151

Ende der Leseprobe aus 154 Seiten

Details

Titel: Likelihood Method for Randomized Time-to-Event Studies with All-or-None Compliance
Untertitel: Causal Inference in Survival analysis
Hochschule: University of Canterbury (Department of Mathematics and Statistics)
Veranstaltung: Statistics
Note: A
Autoren: Irene Hudson (Autor:in)Zhaojing Gong (Autor:in)Patrick Graham (Autor:in)
Jahr: 2009
Seiten: 154
Katalognummer: V357300
ISBN (eBook): 9783668438620
ISBN (Buch): 9783668438637
Dateigröße: 1013 KB
Sprache: Englisch
Schlagworte: Causal inference, Noncompliance, Survival analysis, Maximum likelihood, EM algorithm, Weibull distribution, HIP breast cancer trial, CACE, ITT analysis

Arbeit zitieren: Irene Hudson (Autor:in)Zhaojing Gong (Autor:in)Patrick Graham (Autor:in), 2009, Likelihood Method for Randomized Time-to-Event Studies with All-or-None Compliance, München, GRIN Verlag, https://www.grin.com/document/357300

Kommentare