Likelihood Method for Randomized Time-to-Event Studies with All-or-None Compliance

Causal Inference in Survival analysis


Forschungsarbeit, 2009

154 Seiten, Note: A


Leseprobe


Table of Contents
Summary 1
1 Introduction
2
2
The HIP study 6
2.1
The HIP breast cancer program
6
3
Notation, assumptions, basic concepts
11
3.1
Notation
11
3.2
Assumptions
12
3.3
Basic
Concepts
14
3.3.1 Compliance Types
14
3.3.2 Causal Effects
16
3.3.3 Compliers Average Causal Effect
(CACE)
17
4
Model Structure and Likelihood
18
4.1
Compliance
types
distributions
18
4.2
The Structure of the joint distribution
20
4.2.1 The joint distribution of T, C, D and Z
20
4.2.2 The joint distribution of the observable variables
26
4.3
The
likelihood
function
28
4.3.1 The idealised likelihood function
28
4.3.2 Likelihood function for complete data
30
4.3.3 Likelihood function for incomplete data
36
5
Simplification of model parameter
EE 37
6
EM algorithm
42
6.1
The Expectation step
43
6.2
The Maximisation step
45
7
Model Specifications
45
7.1.
Specific non-ignorable PPO survival models
46
7.2
Parameter of interest
T in specific models
47
7.2.1
Parameter
T in the NIGN-WW model
47
7.2.2
Parameter
T in the NIGN-LL model
49

8
Computational issues
50
8.1
Maximisation of the Likelihood Function using the EM algorithm 51
8.1.1 Posterior Probabilities of Compliance type U
i
51
8.2
Structures of the Expectation function Q (
T, T
k
)
57
8.3
Maximisation of the Expectation function Q (
T, T
k
) 60
8.4
Estimation of standard errors
61
8.4.1 The standard error for the estimated model parameters 61
8.4.2 The observed information matric (OIM)
63
9
Application
65
9.1
Pre-processing of the HIP breast cancer data
65
9.2
HIP data analysis: via non-ignorable censoring mechanism -
PPO-survival models
70
9.2.1 Model specification for the HIP trial
71
9.2.2 Estimation of causal effects: CACE(t) and ACE(t) 73
9.2.3 Estimation of relevant survival probabilities 80
9.3
HIP data analysis: via the Frangakis-Rubin method
84
9.3.1 Frangakis-Rubin (F-R) method
84
9.3.2 Estimation of relevant probabilities 86
9.3.3 Estimation of causal effects: CACE(t) and ACE(t) 87
9.4
Model comparisons 91
9.4.1 Non-ignorable PPO-survival model vs F-R method
91
9.4.2 The NIGN-WW model vs IGN-W model
98
9.5
Results of the analysis via the non-ignorable PPO-survival model 105
9.5.1 Estimated CACE(t) for the HIP
7
data
105
9.5.2 Estimated CACE(t) for the HIP
8
data
105
10
Conclusions and Discussions
107
References 112
Appendix B
118
Appendix C
137
Appendix D
145
Appendix E
148

Likelihood Method for Randomized
Time-to-Event Studies
with All-or-None Compliance
Zhaojing Gong, Irene Hudson, Patrick Graham
SUMMARY
Estimating causal effects in clinical trials often suffers from treatment
non-compliance and missing outcomes. In time-to-event studies, it is more
complicated because of censoring, the mechanism of which may be non-
ignorable. While new estimators have recently been proposed to account
for non-compliance and missing outcomes, few papers have specifically con-
sidered time-to-event outcomes, where even the intention-to-treat (ITT)
estimator is potentially biased for estimating causal effects of assigned treat-
ment. In this paper we develop a likelihood based method for randomized
clinical trials (RCTs) with non-compliance for time-to-event data and adapt
the EM algorithm according to derive the maximum likelihood estimators
(MLEs).
In addition, we give formulations of the average causal effect
(ACE) and compliers average causal effect (CACE) to suit survival anal-
ysis. To illustrate the likelihood-based method (EM algorithm), a breast
1

cancer trial data (Baker, 1998) was re-analyzed using a model, which as-
sumes that the failure times and censored times both follow Weibull and
Lognormal distributions, respectively (termed the NIGN-WW model and
NIGN-LL model).
keywords: Causal inference; Noncompliance; Survival analysis; Maximum
likelihood; EM algorithm; Weibull distribution; HIP breast cancer trial; CACE;
ITT analysis.
1
INTRODUCTION
Estimating causal effect of interventions is often required for empirical studies
in medicine. Randomized clinical trials (RCTs) are only generally approach for
causal inference, but it often suffers from noncompliance. Noncompliance is a
common complication in the analysis of randomized clinical trials (RCTs). It can
take many forms (Baker, 1997), such as switching from the assigned treatment to
another treatment, or not attending all scheduled visits to receive treatment. The
main feature of RCTs with noncompliance is that not every patient in each group
of the RCT receive the same treatment. All-or-none compliance is a special type of
noncompliance, in which switching occurs immediately after randomization, and
every patient either does or does not comply with their assigned treatment com-
pletely, no partial compliance is assumed. All-or-none compliance is of particular
interest because it greatly simplifies modeling and has received more attention
over the last decade. (Imbens & Rubin, 1997; Baker, 1998; Frangakis & Rubin,
1999; Hirano & Imbens, 2000; Jo, 2002; Loeys & Goetghebeur, 2003; Levy et al.,
2004; Peng et al., 2004; O'Malley & Normand, 2005; White, 2005; Gong et al.,
2005).
2

Besides noncompliance, censoring is another problem has to be dealt with in
many randomized clinical trials, since in medicine studies, the outcomes of interest
most likely are time of event of interest. This kind of data is well known as time-
to-event data, also called survival data. It is a special type of data collated in
long-term follow-up studies, in which the outcome of interest is a time of a specific
event, usually treated as "failure time "or "survival time ".
Time-to-event data often appears in the areas of medicine and health study
(survival time), as well as in economics and engineer industry fields (failure time),
usually time to death or the manifestation of a symptom or occurrence of a dis-
ease. Subjects who may not experience the event of interest during the period of
study (observation), or may withdraw from the clinical trial for some reasons are
considered censored. In any one of these cases mentioned above, subjects´survival
time (time of event) will not be observed, this is given in terminology as censoring.
The standard method of analysis of the time-to-event data is so-called Survival
Analysis, in which censoring is the main problem need to be faced.
The standard analysis of a RCT with noncompliance is an intention-to-treat
(ITT) analysis (Angrist et al., 1996; Imbens & Rubin, 1997), which compares out-
comes of subjects by assigned treatment, ignoring the actual treatments received.
ITT only provides an estimator in terms of the causal effect of assigned treatment
(denoted by Z, of an outcome of interest). It has been denoted by ACE in terms
of Average Causal Effect (Holland, 1986; Graham, 2000, 2001) or IT T (Imbens
& Rubin, 1997; Loeys & Goetghebeur, 2003; O'Malley & Normand, 2005). ITT
analysis is a very plausible method for causal inference (Holland, 1986; Pearl,
2001; Greenland, 2004; Rubin, 2005), as it avoids those biases that associated
with estimating the causal effect "as-treated "(comparing outcomes of subjects
by received treatment) or causal effect "per protocol "(comparing outcomes of
3

subjects who comply with their assigned treatment by assigned treatment) (?).
ITT works well only when the assigned treatment and received treatment are
identical for all patients in the trial, at least if most patients satisfy this as-
sumption, otherwise, a biased estimator of causal effect may be produced (Loeys
& Goetghebeur, 2003; Hirano & Imbens, 2000; White, 2005; Baker & Kramer,
2005).
Amongst an extensive body of the literature on noncompliance, most papers
are address noncompliance as missing covariates. Some recently published papers
consider noncompliance with missing outcomes together (Baker et al., 2000; Hi-
rano & Imbens, 2000; Barnard et al., 2003; Peng et al., 2004; Mealli et al., 2004;
O'Malley & Normand, 2005; Baker & Kramer, 2005), however, they are all on
binary or dichotomous outcomes or normal distributions data, rather than the
time-to-event data.
For time-to-event data, to address the noncompliance and missing outcomes at
the same time is a big challenge for standard survival analysis. As in general, sur-
vival analysis follows the assumption of Non-informative censoring (Kalbfleisch
& Prentice, 2002; Klein, 1997; Cox & Oakes, 1984), which assumes that censoring
mechanism is ignorable. Implying that subjects censored at time t do not con-
tribute to the likelihood function. But when compliance is imperfect in a clinical
trial, its censoring mechanism is more than likely to be non-ignorable (Baker, 1994,
1998). This means that the conventional methods of survival analysis can not be
applied simply for analysis those time-to-event data from randomized clinical trial
with noncompliance.
Therefore, to develop a new approach for analyzing causal effects for time-to-
event data from RCTs with noncompliance and missing outcomes, is necessary
and vital (Fleming & Lin, 2000; Fisher, 1999).
4

In survival study area, to the date, we only find a few of papers are related
to noncompliance issue (Robins & Tsiatis, 1991; Baker, 1994, 1998; Frangakis &
Rubin, 1999; Loeys & Goetghebeur, 2003; Herring & Lipsitz, 2004; Gong et al.,
2005). But none of them are focused on general likelihood method with considering
noncompliance and and non-ignorable simultaneously.
Robins & Tsiatis (1991) focus on structural failure time model; Baker (1994,
1998) are based on cause-specific hazards; Frangakis & Rubin (1999) is based
on the Kaplan-Meier estimator (Cox & Oakes, 1984; Klein, 1997; Kalbfleisch &
Prentice, 2002). Herring & Lipsitz (2004) Loeys & Goetghebeur (2003) and Gong
et al. (2005) are based on Cox model Cox & Oakes (1984). Some of them, such as
Loeys & Goetghebeur (2003); Gong et al. (2005) are still under the assumption
of non-informative censoring.
In this paper, we propose a new approach based on the consideration of non-
compliance and missing outcomes together, which is like the circumstance consid-
ered in Frangakis & Rubin (1999), and focus on likelihood function rather than
using survival estimators; as uncompleted covariate is included, EM algorithm is
adapted, to the author's knowledge, this is the first study to date to develop like-
lihood based (EM) approach t survival data RCT's accommodating both missing
outcomes and noncompliance simultaneously.
To illustrate the plausible of proposed likelihood based approach, two specific
Parametric Potential-Outcome (PPO) Survival models, namely the NIGN-WW
and the NIGN-LL models, have been given and applied for re-analysis a time-to-
event data (breast cancer data)from randomized clinical trial conducted by Health
Insurance Plan (HIP) of Greater New York during the 1960s. For comparison,
the method introduced in Frangakis & Rubin (1999), which can be viewed as a
nonparametric method for imperfect compliance survival data, is also implement
5

on computer and applied for the HIP data.
The rest of the paper has a structure as following: Section 2 gives the back-
ground of the HIP data; Section 3 defines notation used in this paper and as well
as assumptions required for our PPO-survival model plus some basic concepts
which are important for establishing our PPO-survival model; Section 4 provides
the likelihood structure; the PPO-survival model parameter is specified in Section
5 and Section 6 is about the EM algorithm. Seall results of re-analysis of HIP
data with different methods are given in Section 9. Conclusion and discussion are
given in Section 10. All related equation of computations are given in Appendix,
as well as tables and figures.
2
The HIP study
2.1
HIP breast cancer program
The breast cancer data used in this paper was collected from an breast screening
study in USA conducted by the Health Insurance Plan (HIP) of Greater New
York during the 1960s.Participants were those women who enrolled in the HIP for
one year or more when the program started in 1963 with asymptomatic of breast
cancer and ages between 40 and 64.
In total 60696 eligible women were included in this program, and all were
randomly assigned into either the "screening "or "control "group. Women in
the "screening "group were offered free annual breast examinations four times.
These examinations consisted of film mammography (cephalocaudal and later
views of each breast); a clinical examination of the breast by a physician (usually
a surgeon), an interview for demographic, other background information and a
health history. While the women in "control "group only received their usual
6

health care. All participants in this study were followed up in the long-term up
to 18 years from their date of entry. Mortality due to breast cancer was the
main outcome variable in the HIP study, and was used to evaluate the efficacy
of the screening program (Shapiro et al., 1988; Baker, 1998). Therefore the event
of interest was death from breast cancer and the outcome of interest was the
observed time (in years) for each individual from entry to death by breast cancer
or loss to follow-up. For those women diagnosed with breast cancer in the study
there was no loss to follow up.
Approximately one-third of women who were assigned to the
"screening
"group refused the offer of a screening examination. Our analysis assumes no
women in the "control "group received "screening " this assumption has been
given by Frangakis & Rubin (1999); Loeys & Goetghebeur (2003) in terms of No
access of treatment for the control group.
We also assume "All-or-none compliance "meaning that we assume women
who took up the offer of screening, complied fully with the screening programme
protocol by attending (at least one of) all scheduled screening visits whereas
women who initially declined to participate in the programme are assumed not to
have attended any screening visits.
Previous research on the HIP breast cancer data (Shapiro, 1977; Shapiro et al.,
1982, 1988) was mostly based on cumulative breast cancer's mortality rate and
survival rate over particular fixed follow-up durations rather than on estimating
survival or hazard functions. In addition, with the exception of Baker (1998)
Richardson and Wells (1995?).
The previous analysis of the HIP by Shapiro (1977); Shapiro et al. (1982, 1988)
have ignored compliance information and estimated the intention-to-treat (ITT)
effect of assigned treatment.
7

Non compliance presents a challenge to traditional statistical methods for the
analysis of RCTs because neither an analysis which ignores non-compliers, nor an
analysis which focuses on treatment actually received can provide [a precise and
reliable answer] valid inferences for on the effect of mammography on breast cancer
mortality. In the former case, the observed outcomes for the intervention group
are a mixture of outcomes for women receiving and not receiving the intervention
In the latter case, the two groups are not necessarily comparable, because the self-
selection of treatment for the intervention group means that treatment received
is not randomized and consequently treatment groups may differ on background
factors related to breast cancer and mortality risk.
The summary statistics of HIP data is given in Table 2.
8

T
able
1:
Primary
notation.
P
oten
tial
and
Observ
able
data
T
erm
Data
for
sub
ject
i
Assigned
treatmen
t
Z
i
=
z
(1
=
in
terv
en
tion
,
0
=
con
trol)
Compliance
ty
p
e
U
i
=
u
(c
=
complier
,n
=
nev
er-tak
er,
and
a
=
alw
a
ys-tak
er)
P
oten
tial
data
Observ
able
Data
Receiv
ed
treatmen
t
D
i
=[
D
i
(1)
,D
i
(0)]
D
obs
i
=
zD
i
(1
)+(
1
-
z
)D
i
(0)
F
ailure
time
T
i
=[
T
i
(1)
,T
i
(0)]
T
obs
i
=
zT
i
(1
)+(
1
-
z
)T
i
(0)
Censored
time
C
i
=[
C
i
(1)
,C
i
(0)]
C
obs
i
=
zC
i
(1
)+(
1
-
z
)C
i
(0)
Censored
indicator
R
i
=[
R
i
(1)
,R
i
(0)]
R
obs
i
=
zR
i
(1
)+(
1
-
z
)R
i
(0)
Observ
ed
time
Y
i
(z
)=
min
(T
i
(z
),C
i
(z
))
Y
i
=
zY
i
(1
)+(
1
-
z
)Y
i
(0)
P
opulation
parameters
P
arameter
Notation
Subp
opulation
Mo
del
parameter
=(
T
,uz
,
C
,uz
,
u,z
d
)(
U,
Z
)
i
=(
u,
z
)
Surviv
al
probabilit
y
S
uz
(t
)(
U,
Z
)
i
=(
u,
z
)
Prop
ortion
U
i
=
u
u
All
Prop
ortion
(Z,
D
)
i
=(
z,
d
)
zd
All
Data
summaries
Statistics
Notation
Subsample
Sample
size
n
zd
r
(Z,
D
,R
)
i
=(
z
,d,
r
)
Mean
observ
ed
time
¯ Y
zd
(Z,
D
)
i
=(
z,
d
)
Prop
ortion
of
ev
en
t
¯ R
(Z,
D
,R
)
i
=(
z
,d,
r
)
9

T
able
2:
Summary
statistics:
HIP
data.
T
otal
Screen
in
vited
Con
trol
Screen
receiv
ed
Screen
refused
No
Screen
receiv
ed
Z
=1
Z
=0
Z
=1
,D
=1
Z
=1
,D
=0
D
=0
Sample
size
60,696
30,131
30,565
20,147
9,984
40,549
BC
patien
ts
2,325
1,171
1,154
835
336
1490
Deaths
of
BC
786
372
414
247
125
539
Mean
of
observ
ed
time
¯ Y
14.051
14.1578
13.9457
14.905
12.6499
13.6266
Prop
ortion
of
¯ R
=
1
0.0129
0.0123
0.0135
0.0123
0.0125
0.0133
Prop
ortion
of
¯ Z
=
1
0.4964
1
0
1
1
0.2462
Prop
ortion
of
¯ D
=
1
0.3319
0.6686
0
1
0
0
R
=
1
:
ev
en
t
o
ccurs;
Z
=
1
:
randomly
assigned
to
the
screening
group;
D
=
1
:
receiv
ed
the
screening
examination.
10

3
Notation, assumptions and basic concepts
3.1
Notation
We develop our model under the same randomization conditions considered in
Frangakis and Rubin (1999). First, a sample of N individuals is randomly sam-
pled from a large population of individuals all of whom are eligible to participate
in the study, and could therefore be assigned to any of the possible treatment
levels. Under this set-up it meaningful to consider contrasts between outcomes
under alternative treatment assignments for the population from which the study
sample is assumed to have been obtained. A randomized controlled trial is con-
structed by randomly allocating each of the N sampled individuals to a treatment
group. Our aim is to make inferences about the population causal effects of treat-
ment assignment, and of treatment received on the basis of the data from the N
individuals enrolled in the study.
We adopt a potential outcomes model (Neyman 1923, Rubin, 1978) for sur-
vival times under alternative treatment assignments.
Let Z
i
= z denote the
assigned treatment (z = 1, for invitation of screen and z = 0 for control); and
let T
i
(1) denote the failure time for the ith individual in the population if as-
signed treatment 1 (for example, invitation for screening) and let T
i
(0) denote the
failure time for that same individual if assigned to treatment 0 (not invited for
screening). Similarly let C
i
(1) and C
i
(0) denote potential censoring times corre-
sponding, respectively to treatment assignments 1 and 0. The censoring times are
the times at which an individual will be censored. As in Imbens & Rubin (1997),
our model also requires indicators for the treatment received under the alterna-
tive treatment assignments,. Thus D(1) denotes treatment actually received if
assigned to treatment, 1, and D(0) the treatment actually received if assigned to
11

treatment 0. In the potential outcomes models observable data is represented as
particular elements of the potential outcome vectors, and the elements of these
vectors actually observed, reflects the treatment assignment. Thus if Z
i
= 1, the
observable treatment received indicator is D
i
(Z
i
) = D
i
(1), the observable failure
time is T
i
(Z
i
) = T
i
(1) and the observable censoring time is C
i
(Z
i
) = C
i
(1), and
similarly if Z
i
= 0. Note that although we refer to T
i
(Z
i
) and C
i
(Z
i
) as 'observ-
able', at most one of these variables is fully observed. If an individual is, in fact,
censored then C
i
(Z
i
) is observed and T
i
(Z
i
) is only partially observed in the sense
that it is known that T
i
(Z
i
) > C
i
(Z
i
). On the other hand if the failure time for
the ith individual is observed the censoring time is only partially observed and
known to be greater than the survival time. While it is standard in much applied
work on survival analysis to structure the analysis in terms of the observable min-
imum of survival and censoring time and a censoring indicator, such a set-up is
valid only under the assumption of ignorable censoring,which is equivalent to the
assumption of the independence of failure time and censored time given in section
3.2 in Klein (1997).
Because our analysis allows for non-ignorable censoring we work directly with
the more fundamental failure time and censoring time variables.
More details of notation used in this paper are given in Table 1.
3.2
Assumptions
Before giving the structure of our PPO-survival model, we list all assumption
required first which most follow those assumptions given in Frangakis & Rubin
(1999).
1. The stable unit treatment value assumption (SUTVA) (Rubin, 1986; Imbens
& Rubin, 1997; Peng et al., 2004): this assumes that the potential outcomes
12

of each individual are unrelated to the treatment status of other individuals.
2. Random assignment: this assumes that each individual has been assigned
randomly to a treatment group.
3. Exclusion restriction: this assumes that the assigned treatment has an effect
on the outcome only through its effect on the received treatment.
4. Monotonicity: the actual received treatment D
i
(z) is a monotone function
with respect to the potential treatment levels, 1 for being in the treatment
group and 0 for being in the control group.
5. Latent ignorability: this assumes that the potential outcomes (of failure
time) and the associated potential non-responses (censoring time) are inde-
pendent at each level of the latent compliance types.
In the literature, the latent ignorability assumption for general data from RCTs
has been expressed as follows:
P r(Y
i
, R
i
|U
i
, Z
i
) = P r(Y
i
|U
i
, Z
i
)P r(R
i
|U
i
, Z
i
);
(3.1)
where Y
i
and R
i
represent the potential outcome variable and the response indi-
cator for the ith individual, respectively; the notation for U
i
and Z
i
are the same
as in Frangakis & Rubin (1999), Mealli et al. (2004) and O'Malley & Normand
(2005). U
i
denotes the compliance type, and Z
i
indicates the assigned treatment.
The analogous assumption for time-to-event data is:
P r(T
i
, C
i
|U
i
, Z
i
) = P r(T
i
|U
i
, Z
i
)P r(C
i
|U
i
, Z
i
),
(3.2)
13

where T
i
and C
i
denote the potential outcome of failure time and censoring
time, respectively. This assumes that in RCTs with time-to-event outcomes, the
potential failure and censoring times are independent conditional on the latent
compliance type. We regard equation (3.2) as the time-to-event version of the
latent ignorabilty assumption given by Frangakis & Rubin (1999). Note that in
this chapter, we use the bold symbols, such as T
i
and C
i
, to denote potential
outcome vectors.
3.3
Basic concepts
For convenience of understanding the model structure given in Section 4, we in-
troduce a few basic concepts in the following.
3.3.1
Compliance types
In literature, a variety of names that have been used to describe individuals' com-
pliance behaviour, such as compliance type (Imbens & Rubin, 1997), compliance
state (O'Malley & Normand, 2005), or principle strata (Frangakis & Rubin, 2002;
White, 2005). This is shown that individuals' compliance behaviour has drawn
much attention from many authors, as doing noncompliance analysis, individuals'
compliance behaviour in RCTs is a very important issue that has to be concerned.
In this paper we follow the work of Imbens & Rubin (1997) and refer to this new
variable as the compliance type and denote it by U.
The definition of compliance type given by Imbens & Rubin (1997) is as follows:
1. Compliers (c) are those individuals who receive the treatment when they
are assigned to the treated group. They would not receive any treatment
when assigned to the control group. That is, compliers always comply with
their assigned treatment Z
i
.
14

2. Always-takers (a) are those individuals who always receive the treatment
regardless which groups they are assigned to. That is, always-takers only
comply with their assigned treatment when Z
i
= 1.
3. Never-takers (n) are individuals who would never receive the treatment re-
gardless which group they are assigned to. That is, never-takers only comply
with their assigned treatment when Z
i
= 0;
4. Defiers (d) are individuals who receive the treatment when they are assigned
to the control group and would not receive the treatment when they are
assigned to the treatment group. That is, defiers never comply with their
assigned treatment Z
i
= z no matter whatever value Z
i
has.
Using the symbol of potential received treatment D
i
= [D
i
(1), D
i
(0)], these
four compliance types can be written mathematically as follows,
U
i
=
c,
(i.e. subject i is a complier)
if D
i
(1) = 1 and D
i
(0) = 0;
n,
(i.e. subject i is a never-taker)
if D
i
(1) = 0 and D
i
(0) = 0;
a,
(i.e. subject i is an always-taker)
if D
i
(1) = 1 and D
i
(0) = 1;
d,
(i.e. subject i is a defier)
if D
i
(1) = 0 and D
i
(0) = 1.
(3.3)
This compliance type variable, U
i
, allows us to classify a population into four
subpopulations by the four categories of individuals' compliance behaviour, that
is compliers (U
i
= c), always-takers (U
i
= a), never-takers (U
i
= n) and defiers
(U
i
= d).
In the literature, the fourth compliance type, defier (U
i
= d), is often omitted,
since there is no reason to suppose that individuals who agree to participate in
a trial will necessarily always counter their experimenters. To avoid this extreme
15

situation, a very common assumption named Monotonicity (Angrist et al., 1996;
Imbens & Rubin, 1997; Frangakis & Rubin, 1999) is given, which supposes that
the component of the received treatment D
i
is a monotone function with respect
to Z
i
, that is D
i
(1) and D
i
(0) have a relationship as follows,
D
i
(1)
D
i
(0).
(3.4)
Instead of using the terminology of Monotonicity, some authors directly as-
sume, however, that there are strictly no defiers in the study (Baker, 1998; Baker
& Kramer, 2005).
3.3.2
Causal Effects
Using the potential outcomes notation it is straightforward to define causal ef-
fects of treatment assignment, for example T
i
(1)
- T
i
(0) is the the causal effect
of treatment assignment for the i
th
individual. Within the potential outcomes
framework it is conventional to define causal effects contrasts between population
averages under alternative treatment assignments (Rubin, 1978, Imbens & Rubin,
1997, Hirano et al, 2000). However for failure time data the full survival curve
is potentially of interest and we therefore define causal effects in a time depen-
dent manner, contrasting population survival curves under alternative treatment
assignments.
Let S(a, t) denote the population survival curve under assignment to treatment
a. This is the survival curve, defined as Pr(T (a) > t), that would be observed
if all members of the population were assigned to treatment a and followed until
failure. Here the probability notation, Pr() can be interpreted as the population
proportion or as a relative frequency under repeated sampling from the population.
16

For any pair of treatment levels labeled, for convenience, 0 and 1, , an average
causal effect of treatment assignment at time t, can be defined as
ACE(t) = S(1, t)
- S(0, t)
(3.5)
Thus the population survival curves induce a population - level causal effect
curve. This time varying notion of causal effect can be related to time-independent
definitions of average causal effect by considering, for, any time t, the indicator
variables I(a, t) defined to equal 1, if T (a) > t and to equal 0, otherwise. For
the ithindividual, the causal effect of treatment at time, t, is then ICE
i
(t) =
I
i
(1, t)
- I
i
(0, t) and the average causal effect is the average of these individual
effects effect at time t, can then be written
ACE(t) = E (I(1, t)
- I(0, t))
= S(1, t)
- S(0, t)
(3.6)
3.3.3
Compliers average causal effect
Because the treatment received by always takers and never takers is not affected
by treatment assignment the subpopulation average causal effect functions for
these groups do not reflect the causal effect of treatment. On the other hand, the
sub-population average causal effect functions for compliers and defiers are of of
interest because these effects may reflect the effect of treatment. For example the
Complier Average Causal Effect (CACE):
CACE = S
c
(1, t)
- S
c
(0, t)
(3.7)
contrasts survival under alternative treatment assignments for a sub-population
17

know to comply with these assignments. It is therefore more reasonable to at-
tribute any effect of assigned treatment in this sub-population to the causal effect
of treatment,than is the case for the full population. Nevertheless, Hirano et al
(2000) that such causal attributions, remain assumptions, the plausibility of which
is increased by study design strategies such as blinding and double blinding.
Implicit in the above definitions is the Stable-Unit-Value Assumption (SUTVA)
(Rubin, 1986 comment ), which holds that outcomes for any given individual are
not affected by the treatment assignment or outcomes of other individuals. This
seems reasonable in screening trials such as HIP and in many other studies of
interventions delivered at the level of the individual. When SUTVA is violated
the vector of potential outcomes must be expanded to accommodate, not only the
outcomes corresponding to each individual's potential exposure but also combi-
nations of potential exposures for other individuals (Infectious disease reference
here?) Definition of causal effects is then correspondingly more difficult.
4
Model Structure and Likelihood
4.1
Compliance type's distribution
Noncompliance is the main issue we have to consider in our PPO-survival approach
and it is often described by compliance type variable, which is defined by the vector
of potential received treatment D
i
= [D
i
(1), D
i
(0)] (equation (3.3)). By following
the property of conditional probability, P r(D
i
|Z
i
, ), the conditional distribution
of vector D
i
given the assigned treatment Z
i
, (where is the associated parameter
18

for this distribution) is broken down in equation (4.1).
P r(D
i
|Z
i
; ) = P r(D
mis
i
, D
obs
i
|Z
i
; )
= P r(D
mis
i
|D
obs
i
, Z
i
;
zd
)P r(D
obs
i
|Z
i
;
z
),
(4.1)
where
zd
and
z
represent the parameters associated with these two conditional
distributions of D
mis
i
and D
obs
i
respectively. This is because D
i
can also be ex-
pressed as [D
mis
i
, D
obs
i
], where D
mis
i
and D
obs
i
denote the unobservable and ob-
servable components of potential received treatment vector D
i
. Note that when
Z
i
is a binary variable, we can write D
mis
i
= D
i
(1
- Z
i
) and D
obs
i
= D
i
(Z
i
).
On the other hand, by the definition of compliance type U
i
, given the actual
received treatment D
obs
i
= D
i
(Z
i
), learning the value of D
mis
i
is equivalent to
learning the value of U
i
. This implies that the conditional probability distribution
of the compliance type U
i
given D
obs
i
= D
i
(Z
i
) and Z
i
, P r(U
i
|D
i
(Z
i
), Z
i
, ;
), can
be obtained from P r(D
mis
i
|D
obs
i
, Z
i
; ), the conditional probability distribution of
D
mis
i
, the unobservable component of the potential received treatment vector D
i
,
given the actual received treatment D
obs
i
and assigned treatment Z
i
. In other
words:
P r(U
i
|D
obs
i
, Z
i
;
)
d
P r(D
mis
i
|D
obs
i
, Z
i
;
zd
);
(4.2)
where
d
indicates distributional equivalence: any probability on one side of
the equation can be obtained from the distribution on the other side. For example,
P r(U
i
= c
|D
obs
i
= 1, Z
i
= 1,
) = P r(D
mis
i
= 0
|D
obs
i
= 1, Z
i
= 1,
zd
).
After substituting equation (4.2) into the second line in equation (4.1), we
have:
P r(D
i
|Z
i
, )
d
P r(U
i
|D
obs
i
, Z
i
;
)P r(D
obs
i
|Z
i
;
z
),
(4.3)
where
denotes the parameter corresponding to the distribution of U
i
condi-
19

tional on the actual received treatment D
obs
i
and assigned treatment Z
i
, and
z
is the parameter associated with the distribution of D
obs
i
given Z
i
. This implies
that P r(D
i
|Z
i
, ) can be obtained from the product of P r(U
i
|D
obs
i
, Z
i
;
) and
P r(D
obs
i
|Z
i
;
z
). In other words, P r(D
i
|Z
i
, ), the conditional distribution of D
i
(the potential received treatment vector) given assigned treatment Z
i
, can also
be obtained from the product of two conditional distributions, P r(U
i
|D
obs
i
, Z
i
;
)
and P r(D
obs
i
|Z
i
;
z
), the former is the conditional distribution of U
i
given the
actual received treatment D
obs
i
and assigned treatment Z
i
, and the latter is the
conditional distribution of D
obs
i
given Z
i
(equation (4.3)).
4.2
The structure of the joint distribution
4.2.1
The joint distribution of T, C, D and Z
Suppose a RCT includes N individuals, each component of the potential outcomes
vector, including failure time T
i
= [T
i
(1), T
i
(0)], censoring time C
i
= [C
i
(1), C
i
(0)]
and the potential received treatment D
i
= [D
i
(1), D
i
(0)], can then be written as
a vector of length N, as follows:
T(1) = [T
1
(1), . . . , T
N
(1)] , the failure time if assigned to the treatment group;
T(0) = [T
1
(0), . . . , T
N
(0)], the failure time if assigned to the control group;
C(1) = [C
1
(1), . . . , C
N
(1)], the censoring time if assigned to the treatment group;
C(0) = [C
1
(0), . . . , C
N
(0)], the censoring time if assigned to the control group;
D(1) = [D
1
(1), . . . , D
N
(1)], the received treatment if assigned to the treatment
group;
D(0) = [D
1
(0), . . . , D
N
(0)], the received treatment if assigned to the control
group;
20

We also assume that all related variables are independent and identically dis-
tributed (i.i.d.) random variables, which include failure time T, censoring time
C, potential received treatment D and assigned treatment Z. Their joint distri-
bution can then be expressed as a product of the joint distribution for N single
observations, that is:
P r(T(1), T(0), C(1), C(0), D(1), D(0), Z; )
=
i
P r(T
i
(1), T
i
(0), C
i
(1), C
i
(0), D
i
(1), D
i
(0), Z
i
; ),
(4.4)
where denotes the model parameter vector.
We suppose that the model parameter can be broken down into three parts,
namely = (
, , ), where is the parameter associated with the joint dis-
tribution of failure and censoring times (outcomes), is the parameter for the
distribution of the vector of potential received treatment D
i
(post-treatment co-
variates), and is the parameter corresponding to the distribution of assigned
treatment Z
i
(pre-treatment covariates). According to the rule of conditional dis-
tribution, the joint probability of all the aforementioned variables, corresponding
to the ith individual, P r(T
i
(1), T
i
(0), C
i
(1), C
i
(0), D
i
(1), D
i
(0), Z
i
; ), can thus
be expressed as a product of three distributions:
· The joint distribution of failure time T
i
= [T
i
(1), T
i
(0)] and censoring time
C
i
= [C
i
(1), C
i
(0)] given potential received treatment D
i
= [D
i
(1), D
i
(0)]
and assigned treatment Z
i
, P r(T
i
(1), T
i
(0), C
i
(1), C
i
(0)
|D
i
(1), D
i
(0), Z
i
;
).
· The distribution of potential received treatment D
i
= [D
i
(1), D
i
(0)] given
assigned treatment Z
i
, P r(D
i
(1), D
i
(0)
|Z
i
; ).
· The distribution of assigned treatment Z
i
, P r(Z
i
, ).
21

Note that
, and denote the parameters associated with these above three
distributions, respectively. We then have:
P r(T
i
(1), T
i
(0), C
i
(1), C
i
(0), D
i
(1), D
i
(0), Z
i
; ) =
P r(T
i
(1), T
i
(0), C
i
(1), C
i
(0)
|D
i
(1), D
i
(0), Z
i
;
)P r(D
i
(1), D
i
(0)
|Z
i
; )P r(Z
i
; ).
(4.5)
Using the notation of potential-outcome vectors, equation (4.5) can thus be ex-
pressed as follows:
P r(T
i
,C
i
, D
i
, Z
i
; ) =
P r(T
i
, C
i
|D
i
, Z
i
;
)P r(D
i
|Z
i
; )P r(Z
i
; );
(4.6)
for = (
, , ), the model parameter vector defined earlier.
In Section 4.1, we derived equation (4.2), which allows us to replace P r(D
i
|Z
i
; )
with P r(U
i
|D
obs
i
, Z
i
; )P r(D
obs
i
|Z
i
;
z
). The joint distribution of all the aforemen-
tioned variables given in equation (4.4) can now be rewritten as follows:
P r(T
i
,C
i
, D
i
, Z
i
; )
d
P r(T
i
, C
i
|Z
i
, D
i
;
)P r(U
i
|Z
i
, D
obs
i
; )P r(D
obs
i
|Z
i
;
z
)P r(Z
i
; ),
= P r(T
i
, C
i
|Z
i
, D
i
;
)P r(U
i
|Z
i
, D
obs
i
; )P r(D
obs
i
, Z
i
; );
(4.7)
where
d
indicates equivalence in distribution.
The last equality holds in equation (4.7) is because
P r(D
obs
i
, Z
i
; ) = P r(D
obs
i
|Z
i
,
z
)P r(Z
i
; ).
(4.8)
Therefore, these two parameters
z
and , have been replaced with the parameter
of . The advantage of using to replace
z
and is that it simplifies notation of
22

the parameters involved in the model. Recall that the main parameter of interest
in our approach is
, the parameter associated with the joint distribution of failure
time and censoring time (survival outcomes).
In addition, the vector of received treatment D
i
can be viewed as being
equivalent to compliance type U
i
. This is because, on one hand, U
i
= u (for
u
{c, a, n, d}) is determined by D
i
= [D
i
(1) = d
1
, D
i
(0) = d
0
] (for d
0
, d
1
{0, 1}) (equation (3.3)), and, on the other hand, if we know the compliance type
U
i
= u, the values of vector D
i
= [D
i
(1), D
i
(0)] are then also known. So we write
(U
i
= u)
(D
i
= [D
i
(1), D
i
(0)] = [d
1
, d
0
]) where
denotes equivalence.
Let U
i
replace D
i
. We have:
P r(T
i
, C
i
, D
i
, Z
i
; )
d
P r(T
i
, C
i
, U
i
, Z
i
; );
(4.9)
then equation (4.7) is equivalent to
P r(T
i
,C
i
, U
i
, Z
i
; )
d
P r(T
i
, C
i
|Z
i
, U
i
;
)P r(U
i
|Z
i
, D
obs
i
; )P r(D
obs
i
, Z
i
; ).
(4.10)
Furthermore, it is useful to impose some additional structure on the model.
This is achieved by partitioning the parameter vector as
= (
T
,
C
,
T C
)
T
,
where
T
denotes transpose. We also require that the marginal bivariate failure
time distribution depends only on
T
and that the marginal bivariate censoring
time distribution depends only on
C
. Hence the
T C
parameters act as association
parameters. This latter condition is only required to complete the parametrisation
of the joint distribution of the potential failure and censoring times.
By following the assumption of latent ignorability, equation (3.2), which gives
the joint distribution of failure and censoring times conditional on U
i
and Z
i
,
23

denoted by P r(T
i
, C
i
|Z
i
, U
i
;
), can be separated as a product of two distributions
of the failure time and the censoring time (conditional on compliance type U
i
only
because of randomisation). Specifically:
P r(T
i
, C
i
|U
i
, Z
i
;
) = P r(T
i
|U
i
, Z
i
;
T
)P r(C
i
|U
i
, Z
i
;
C
),
(4.11)
This means we can break down
into = (
T
,
C
)
T
. In other words, under the
latent ignorability assumption, the third part of
, namely
T C
, has been omitted.
Consequently, the model parameter can then be broken down as =
{
T
,
C
,
, }, where the subcomponent parameters of are detailed below:
T
corresponds to the parametrisation of the distribution of failure time,
C
corresponds to the parametrisation of the distribution of censoring time,
corresponds to the parametrisation of the conditional distribution of compliance
type U
i
, and
corresponds to the parametrisation of the joint distribution of (D
obs
i
, Z
i
).
Equation (4.11) can also be expressed as the following equivalence relation-
ships:
Pr(T
i
(1), C
i
(1)
|U
i
, Z
i
= 1;
) =Pr(T
i
(1)
|U
i
, Z
i
= 1;
T 1
)Pr(C
i
(1)
|U
i
, Z
i
= 1;
C1
)
Pr(T
i
(0), C
i
(0)
|U
i
, Z
i
= 0;
) =Pr(T
i
(0)
|U
i
, Z
i
= 0;
T 0
)Pr(C
i
(0)
|U
i
, Z
i
= 0;
C0
).
(4.12)
We propose a similar structuring of the marginal bivariate failure and cen-
soring time models by assuming
T
= (
T 1
,
T 0
,
T 10
) and
C
= (
C1
,
C0
,
C10
)
24

respectively. We then have:
Pr(T
i
(0)
|U
i
;
T
) = Pr(T
i
(0)
|U
i
;
T 0
)
(4.13)
Pr(T
i
(1)
|U
i
;
T
) = Pr(T
i
(1)
|U
i
;
T 1
).
(4.14)
Pr(C
i
(0)
|U
i
;
C
) = Pr(C
i
(0)
|U
i
;
C0
)
(4.15)
Pr(C
i
(1)
|U
i
;
C
) = Pr(C
i
(1)
|U
i
;
C1
).
(4.16)
Note that the third components of
T
and
C
, denoted by
T 10
and
C10
, were
not involved. This can be interpreted as that an individual in a RCT can only be
assigned into one of treatment groups (Z
i
= 1 or 0). No one can be assigned into
both two treatment groups (Z
i
= 1 and 0). In other words, in practice no such a
situation exists which need parameter
T 10
or
C10
to describe. So we simplify
T
and
C
as
T
= (
T 1
,
T 0
) and
C
= (
C1
,
C0
).
Under randomisation, the assigned treatment Z is independent of all other
variables. This is denoted as Z
(T, C, D, U), where means independent. It
therefore follows that:
P r(T
i
, C
i
|Z
i
, U
i
;
) = P r(T
i
, C
i
|U
i
;
).
(4.17)
Equation (4.17) represents a major simplification because it means that the
dependency of the potential survival (and censoring) times on assigned treatment
(Z
i
) does not have to be explicitly modelled. In non-randomised studies, this
dependency normally has to be modelled. Without additional assumptions, this
generally leads to problems with identifiability (Rubin, 1978; Graham, 2000).
25

The joint distribution of all potential entities for the ith observation can then
be expressed as follows:
P r(T
i
,C
i
, D
i
, Z
i
; )
d
P r(T
i
|U
i
;
T
)P r(C
i
|, U
i
;
C
)P r(U
i
|D
obs
i
;
)pr(D
obs
i
, Z
i
; );
(4.18)
where the model parameter is =
{
T
,
C
,
, }.
Our primary focus is on
T
. In particular, we wish to focus on (
T 1
,
T 0
),
the parameters of the marginal failure time distributions. Given estimates of
these parameters and given our specific distributional assumptions, the estimated
potential survivor functions (under the alternative assigned treatment) and the
estimators of ACE(t) and CACE(t) can then be obtained. Here ACE(t) and
CACE(t) are the forms of average causal effect (ACE) and complier average causal
effect (CACE) (Angrist et al., 1996; Imbens & Rubin, 1997; Yau & Little, 2001;
Peng et al., 2004; Mattei & Mealli, 2007), respectively, for time-to-event studies.
Note that they both are functions of time t.
4.2.2
The joint distribution of the observable variables
For deriving the formulation of the likelihood function, we need to address the
construct of the joint distribution of observed variables first, as it is well known
that the likelihood function of the model parameter , denoted by L
F
(), is a
function of , for fixed observed variables, and the function is proportional to the
joint distribution of all observed variables.
In time-to-event studies, however, failure time T
i
and censoring time C
i
cannot
be fully observed jointly. It is convenient to present a construct of the joint distri-
bution of all observable variables (denoted by P r(T
obs
, C
obs
, D
obs
, Z; )), rather
than the joint distribution of all observed variables, as the former can be ob-
26

tained directly by integrating P r(T
i
, C
i
, D
i
, Z
i
; )) with respect to the unobserv-
able/missing variables, which are T
mis
i
= T
i
(1
- Z
i
), C
mis
i
= C
i
(1
- Z
i
) and
D
mis
i
= D
i
(1
- Z
i
) when Z
i
is a binary. Therefore:
P r(T
obs
i
, C
obs
i
, D
obs
i
, Z
i
; )
=
P r(T
i
, C
i
, D
i
, Z
i
; )dD
mis
i
dC
mis
i
dT
mis
i
,
(4.19)
where P r(T
i
, C
i
, D
i
, Z
i
; ) is given in equation (4.18).
Because of equation (4.18), equation (4.19) can then be expressed as follows
P r(T
obs
i
, C
obs
i
, D
obs
i
, Z
i
; )
d
P r(T
i
|U
i
;
T
)P r(C
i
|U
i
;
C
)P r(U
i
|D
obs
i
;
)P r(D
obs
i
, Z
i
; )dU
i
dC
mis
i
dT
mis
i
,
(4.20)
where dD
mis
i
is replaced by dU
i
because, given D
obs
i
and Z
i
, we have D
mis
i
U
i
.
Recall that D
obs
i
= D
i
(Z
i
), this implies that for the ith individual, his/her assigned
treatment Z
i
and actual received treatment D
obs
i
are always observed.
Table 3: Compliance types U
i
= u given observed values of Z
i
and D
i
(Z
i
).
Z
i
D
i
(Z
i
)
Possible U
i
1
1
a or c
1
0
n
0
1
a
0
0
n or c
Under the assumptions of exclusion restriction and monotonicity, compliance
type U
i
has up to three categories, which are compliers (U
i
= c), never-takers
(U
i
= n) and always-takers (U
i
= a) and shown in Table 3. Then integration in
equation (4.20), with respect to U
i
(compliance type), can be further replaced by
27

its summation. We then write:
P r(T
obs
i
, C
obs
i
, D
obs
i
, Z
i
; )
d
U
i
{c,n,a}
P r(T
i
|U
i
;
T
)P r(C
i
|U
i
;
C
)P r(U
i
|D
obs
i
;
)P r(D
obs
i
, Z
i
; )dC
mis
i
dT
mis
i
,
= P r(D
obs
i
, Z
i
; )
U
i
{c,n,a}
P r(U
i
|D
obs
i
;
)
P r(T
i
|U
i
;
T
)dT
mis
i
P r(C
i
|U
i
;
C
)dC
mis
i
,
= P r(D
obs
i
, Z
i
; )
U
i
{c,n,a}
P r(T
obs
i
|U
i
;
T
)P r(C
obs
i
|U
i
;
C
)P r(U
i
|D
obs
i
;
).
(4.21)
where
d
denotes distributional equivalence.
4.3
The likelihood function
As stated in Section 4.2.2, the likelihood function with respect to the model pa-
rameter can be a function that is proportional to the joint distribution of all
observed variables (p9, Tanner (1993)). However, because in time-to-event stud-
ies for the ith individual, either failure time (T
obs
i
) or censoring time (C
obs
i
) can
be observed, but not both. This implies that P r(T
obs
i
, C
obs
i
, D
obs
i
, Z
i
; ) can only
be viewed as the joint distribution for the observable variables (equation (4.21))
rather than the joint distribution for the observed variables.
Hence, instead of the likelihood function, we formulate a function of the model
parameter that is proportional to the joint distribution of all observable vari-
ables, the so-called the idealised likelihood function with respect to the model
parameter denoted by IDL().
4.3.1
The idealised likelihood function
Given the joint distribution of the observable variables, P r(T
obs
i
, C
obs
i
, D
obs
i
, Z
i
|),
in equation (4.21), for the ith observation, the idealised likelihood function with
28

respect to the model parameter , denoted by IDL
i
(), has the following form:
IDL
i
()
P r(T
obs
i
, C
obs
i
, D
obs
i
, Z
i
; ).
(4.22)
On the other hand, by following equation (4.21), the product of four distribu-
tions, P r(T
obs
i
|U
i
, Z
i
;
T
),P r(C
obs
i
|U
i
, Z
i
;
C
), P r(U
i
|D
obs
i
, Z
i
; ) and P r(D
obs
i
, Z
i
; )
can be viewed as a function of the model parameter = (
T
,
C
,
, ), and
it is proportional to the joint distribution of the idealised observed variables,
P r(T
obs
i
, C
obs
i
, D
obs
i
, Z
i
; ).
The idealised likelihood function with respect to the model parameter ,
(IDL
i
()), can then be obtained from the right hand side of equation (4.21),
so it has the formulation as follows:
IDL
i
()
=
U
i
{c,n,a}
P r(T
obs
i
|U
i
, Z
i
;
T
)P r(C
obs
i
|U
i
, Z
i
;
C
)P r(U
i
|D
obs
i
, Z
i
;
)P r(D
obs
i
, Z
i
; )
=P r(D
obs
i
, Z
i
; )
U
i
{c,n,a}
P r(T
obs
i
|U
i
, Z
i
;
T
)P r(C
obs
i
|U
i
, Z
i
;
C
)P r(U
i
|D
obs
i
, Z
i
;
),
(4.23)
where the second equation holds because the term P r(D
obs
i
, Z
i
; ) is not related
to the compliance covariate U
i
involved in the summation (equation (4.23)).
Note that in equation (4.23), assigned treatment Z
i
is involved in each term of
conditional distribution on the right-hand side. This is true because the observable
variables, namely T
obs
i
, C
obs
i
and D
obs
i
, all rely on the given value of Z
i
.
Recall that in our approach, the main parameter of interest is
T
, the param-
eter associated with the distribution of failure time T
i
. However, from equation
(4.23), we can see that because the three parameters,
T
,
C
and
, are all
involved in the summation over compliance type U
i
, the likelihood for these pa-
rameters does not separate into distinct components which can be maximised
29

separately. This implies that estimation of
T
cannot proceed independently. So
we write these three parameters as a row vector,
= (
T
,
C
,
), which are asso-
ciated with the distributions of failure time, censoring time and compliance type,
respectively. We then focus on deriving the idealised likelihood function for
,
denoted by IDL
i
(
) rather than IDL
i
(). Then IDL
i
(
) can be expressed as
follows:
IDL
i
(
)
=
U
i
{c,n,a}
P r(T
obs
i
|U
i
, Z
i
;
T
)P r(C
obs
i
|U
i
, Z
i
;
C
)P r(U
i
|D
obs
i
, Z
i
;
).
(4.24)
4.3.2
Likelihood function for complete data
Using R
i
to denote the censoring indicator, which indicates the situation of ob-
served failure time or censoring time, (1 for the event occurring (observed failure
time), 0 for censoring (observed censoring time)). Therefore, once given the cen-
soring indicator R
i
for the ith individual, the idealised likelihood function IDL
i
(
)
can be specified to the likelihood function L
i
(
), which can be written as the fol-
lowing equation:
L
i
(
)
=
U
i
{c,n,a}
pr(T
obs
i
= t
|U
i
, Z
i
;
T
)
×P r(C
obs
i
> t
|U
i
, Z
i
;
C
)
×P r(U
i
|D
obs
i
, Z
i
; )
for R
i
= 1, T
i
(z) = t,
U
i
{c,n,a}
P r(T
obs
i
> t
|U
i
, Z
i
;
T
)
×pr(C
obs
i
= t
|U
i
, Z
i
;
C
)
×P r(U
i
|D
obs
i
, Z
i
;
)
for R
i
= 0, C
i
(z) = t.
(4.25)
30

where P r() and pr() denote the probability function and the probability density
function, respectively. However, in practice, the compliance type U
i
cannot be
observed fully. Hence the parameter
cannot be estimated through the form of
L
i
(
) given in equation (4.25). Dempster et al. (1977) proposed an approach for
MLE of incomplete data, named the EM-algorithm, which can be employed in
our case as U
i
is considered unknown and can be treated as incomplete data.
Briefly, the EM algorithm involves two steps: the E-step and the M-step,
which denote the expectation and maximisation steps, respectively (Dempster
et al., 1977). The E-step aims to obtain the conditional expectation of the log-
likelihood function for complete data log(L
F
(
)), denoted by Q(,
k
), given the
observed data T (z), C(z), D(z) and Z = z, and given the current parameter
values
k
. More details for the EM algorithm is given in Section 6.
In the EM-algorithm, the first step requires the explication of the likelihood
form for complete data denoted by L
F
(
). This depends on the assumption that
all compliance covariates are known. We then find the related likelihood form
for the ith observation, which can be written as follows (for known compliance
covariates U
i
= u):
L
F
i
(
) = L
i
(
, u)
=
pr(T
obs
i
= t
|U
i
, Z
i
;
T
)
×P r(C
obs
i
> t
|U
i
, Z
i
;
C
)
×P r(U
i
|D
obs
i
, Z
i
; )
for R
i
= 1, T
i
(z) = t,
P r(T
obs
i
> t
|U
i
, Z
i
;
T
)
×pr(C
obs
i
= t
|U
i
, Z
i
;
C
)
×P r(U
i
|D
obs
i
, Z
i
; )
for R
i
= 0, C
i
(z) = t.
(4.26)
Compliance type U
i
= u is used to describe an individual's compliance behaviour.
31

Therefore, we have redefined the parameter of interest,
T 1
= (
T,u1
) and
T 0
=
(
T,u0
) (for u
{c, a, n}). Recall that
T
= (
T 1
,
T 0
). More specifically, the
parameter of interest associated with the failure time distribution,
T
, is redefined
as
T
= (
T,c1
,
T,n1
,
T,a1
,
T,c0
,
T,n0
,
T,a0
). This is equivalent to
T
= (
T,uz
)
(for u
{a, c, n} and z {0, 1}). Here the subscripts indicate failure time (T )
with U = u for u
{a, c, n} and assigned treatment Z = z for z {0, 1}.
Similarly, we use
C,uz
for u
{a, c, n}, and z {0, 1} to denote the compo-
nents of
C
, the parameter associated with the censoring time distribution. In
other words, we have
C
= (
C,c1
,
C,n1
,
C,a1
,
C,c0
,
C,n0
,
C,a0
).
Hence, for the specific assigned treatment Z
i
= z (with known U
i
= u), equa-
tion (4.26) can be simplified as follows:
L
i
(
, u)
=
pr(T (z)
i
= t
|U
i
, Z
i
;
T,uz
)
×P r(C(z)
i
> t
|U
i
, Z
i
;
C,uz
)
×P r(U
i
|D
i
(z), Z
i
;
u,zd
)
for R
i
= 1, T
i
(z) = t,
P r(T (z)
i
> t
|U
i
, Z
i
;
T,uz
)
×pr(C(z)
i
= t
|U
i
, Z
i
;
C,uz
)
×P r(U
i
|D
i
(z), Z
i
;
u,zd
)
for R
i
= 0, C
i
(z) = t.
(4.27)
where R
i
is the observed indicator of censoring, which is 1 for the event of in-
terest and 0 for censoring. The parameter
u,zd
is associated with the condi-
tional probability of compliance, given Z
i
= z and D
i
(z) = d, and is defined by
u,zd
= P r(U
i
= u
|Z
i
= z, D
i
(z) = d) for u = (a, c, n) and z, d = 0, 1. Details
about the estimation of the parameter
u,zd
will be discussed in Section 8.
According to equation (4.27), an individual randomised to treatment Z
i
= 1
who dies or is censored at (observed) time t has a contribution to the likelihood
32

as follows:
L
i
{i:Z
i
=1}
(
, u)
=
pr(T (1)
i
= t
|U
i
= u, Z
i
= 1;
T,u1
)
×P r(C(1)
i
> t
|U
i
= u, Z
i
= 1;
C,u1
)
×P r(U
i
|D
i
(1), Z
i
= 1;
u,1d
)
for R
i
= 1, T (1)
i
= t,
P r(T (1)
i
> t
|U
i
= u, Z
i
= 1;
T,u1
)
×pr(C(1)
i
= t
|U
i
= u, Z
i
= 1;
C,u1
)
×P r(U
i
|D
i
(1), Z
i
= 1;
u,1d
)
for R
i
= 0, C(1)
i
= t.
(4.28)
for d = 0 or 1 and u
{c, a, n} .
Similarly, an individual randomised to treatment Z
i
= 0 who dies or is censored
at (observed) time t has a contribution to the likelihood for
as follows:
L
i
{i:Z
i
=0}
(
, u)
=
pr(T (0)
i
= t
|U
i
= u, Z
i
= 0;
T,u0
)
×P r(C(0)
i
> t
|U
i
= u, Z
i
= 0;
C,u0
)
×P r(U
i
|D
i
(0), Z
i
= 0;
u,0d
)
for R
i
= 1, T (0)
i
= t,
P r(T (0)
i
> t
|U
i
= u, Z
i
= 0;
T,u0
)
×pr(C(0)
i
= t
|U
i
= u, Z
i
= 0;
C,u1
)
×P r(U
i
|D
i
(0), Z
i
= 0;
u,0d
)
for R
i
= 0, C(0)
i
= t.
(4.29)
for d = 0 or 1 and u
{c, n, a} .
Recall from Table 3 that knowing the values of Z
i
= z and D
i
(z) = d means
that compliance type U
i
= u can be determined for up to two types. Hence, given
the value of z and d, equations (4.28) and (4.29) can then be expressed as the
33

following form:
L
i
{i:Z
i
=z,D
i
(z)=d}
(
; u)
=
pr(T (z)
i
= t
|U
i
= u, Z
i
= z;
T,uz
)
×P r(C(z)
i
> t
|U
i
= u, Z
i
= z;
C,uz
)
×P r(U
i
|D
i
(z), Z
i
= z;
u,zd
)
for R
i
= 1, T (z)
i
= t,
P r(T (z)
i
> t
|U
i
= u, Z
i
= z;
T,uz
)
×pr(C(z)
i
= t
|U
i
= u, Z
i
= z;
C,uz
)
×P r(U
i
|D
i
(z), Z
i
= z;
u,zd
)
for R
i
= 0, C(z)
i
= t.
(4.30)
for z and d = 0 or 1 and u
{c, a, n} . Note that equation (4.30) given above is
a likelihood function of our non-ignorable PPO-survival model which is analo-
gous to the one given in O'Malley & Normand (2005) model that is for normally
distributed outcomes.
For notational convenience, we introduce the following conventions:
f
i
u
(v, t;
T,uv
) = pr(T
i
(v) = t
|U
i
= u,
T,uv
)
for v = 0 or 1,
S
i
u
(v, t;
T,uv
) = P r(T
i
(v) > t
|U
i
= u,
T,uv
)
for v = 0 or 1,
e
i
u
(v, t;
C,uv
) = pr(C
i
(v) = t
|U
i
= u,
C,uv
)
for v = 0 or 1,
G
i
u
(v, t;
C,uv
) = P r(C
i
(v) > t
|U
i
= u,
C,uv
) for v = 0 or 1.
(4.31)
As mentioned before, P r() and pr() denote the probability and probability
density function, respectively and v indicates potential treatment levels.
Equation (4.30) can then be rewritten as:
L
i
{i:Z
i
=z,D
i
(z)=d}
(
; u)
= f
i
u
(z, t;
T,uz
)G
i
u
(z, t;
C,uz
)
R
i
S
i
u
(z, t;
T,uz
)e
i
u
(z, t;
C,uz
)
1-R
i
u,zd
.
(4.32)
34

Furthermore, we write:
F
i
u,zd
(
T,uz
,
C,uz
)
= f
i
u
(z, t;
T,uz
)G
i
u
(z, t;
C,uz
)
R
i
S
i
u
(z, t;
T,uz
)e
i
u
(z, t;
C,uz
)
1-R
i
.
(4.33)
Equation (4.30) can then be expressed more simply as:
L
i
{i:Z
i
=z,D
i
(z)=d}
(
; u) = F
i
u,zd
(
T,uz
,
C,uz
)
u,zd
= l
i
(
; u)l
i
(
; u)
(4.34)
Because of the independent and identically distributed (i.i.d.) assumption for
all observable variables (conditional on model parameters), the likelihood function
for complete data denoted by L
F
() = L(, u) can be expressed by the product
of the fully likelihood function associated with each individual. In other words,
we have:
L
F
(
) =
i=1
L
i
(
, u) =
i=1
F
i
u,zd
(
T,uz
,
C,uz
)
u,zd
=
i=1
l
i
(
; u)l
i
(
; u). (4.35)
Given observed Z
i
= z and D
i
(z) = d, the likelihood form of for complete
data, in which compliance type U
i
(for each individual) is presumed to be known,
35

can then be expressed as:
L
F
(
) = L(, u) =
i
L
i
(
, u)
=
i
{i:U
i
=c,Z
i
=1,D
i
(1)=1}
F
i
c,11
(
T,c1
,
C,c1
)
c,11
i
{i:U
i
=a,Z
i
=1,D
i
(1)=1}
F
i
a,11
(
T,a1
,
C,a
)
a,11
i
{i:U
i
=c,Z
i
=0,D(0)=0}
F
i
c,00
(
T,c0
,
C,c0
)
c,00
i
{i:U
i
=n,Z
i
=0,D
i
(0)=0}
F
i
n,00
(
T,n0
,
C,n0
)
n,00
i
{i:U
i
=n,Z
i
=1,D(0)=0}
F
i
n,10
(
T,n0
,
C,n0
)
n,10
i
{i:U
i
=a,Z
i
=0,D(0)=1}
F
i
a,01
(
T,a1
,
C,a1
)
a,01
.
(4.36)
4.3.3
The likelihood form for incomplete data
Since compliance type U
i
is unobservable, U
i
needs to be treated as a variable
with missing data. Hence, data associated with the case where noncompliance
occurs have to be viewed as incomplete data.
Let
A(z, d) denote the subset of subjects with specified values of Z and D;
more specifically,
A(z, d) = {i : Z
i
= z, and D
i
= d
}. The likelihood form for
incomplete data for the non-ignorable PPO-survival models can then be written
as shown below:
L(
) =
u
{c,n,a}
L(
, u) =
u
{c,n,a} i
L
i
(
, u)
=
i
A(1,1)
F
i
c,11
(
T,c1
,
C,c1
)
c,11
+ F
i
a,11
(
T,a1
,
C,a1
)
a,11
i
A(0,0)
F
i
c,00
(
T,c0
,
C,c0
)
c,00
+ F
i
n,00
(
T,n0
,
C,n0
)
n,00
i
A(1,0)
F
i
n,10
(
T,n0
,
C,n0
)
n,10
i
A(0,1)
F
i
a,01
(
T,a1
,
C,a1
)
a,01
.
(4.37)
The likelihood function of
for incomplete data, given in equation (4.37),
36

does not appear to be readily broken down into products of functions involving
only
T
and
C
, respectively. That is L(
) = L(
T
)L(
C
). That is the case where
censoring is not ignorable (Baker, 1997).
5
Simplification of model parameter
In equation (4.36) (on page 36), we observe that the parameter
has the form
= (
T,uz
,
C,uz
,
u,zd
) for u = (a, c, n) and z, d = (0, 1). However, the number of
model parameters can be further reduced by following the assumption of compound
exclusion restriction (Angrist et al., 1996; Frangakis & Rubin, 1999).
The compound exclusion restriction assumes that all potential-outcome vectors
can be affected by the received treatment D
i
(Z), rather than by the assigned
treatment Z
i
. This implies that the outcome of interest for an individual, as
long as his/her received treatment has the same value, will have the same value
when he/she is assigned into either treatment group (if possible in practice). For
convenience to express the assumption of exclusion restriction mathematically, we
use the notation T
i
(Z
i
= z, D
i
(z) = d) and C
i
(Z
i
= z, D
i
(z) = d) to denote the ith
individual's failure and censoring time where his/her assigned treatment Z
i
and
actual received treatment D
i
(Z
i
) are given by Z
i
= z and D
i
(z) = d, respectively.
Recall that in a two-arm RCT, we have z and d
{1, 0}. Hence the exclusion
restriction assumption can be expressed as follows:
T
i
(Z
i
= 1, D
i
(1) = d) = T
i
(Z
i
= 0, D
i
(0) = d)
for failure time,
C
i
(Z
i
= 1, D
i
(1) = d) = C
i
(Z
i
= 0, D
i
(0) = d)
for censoring time.
(5.1)
This also implies that individuals who are never-takers or always-takers would
have the same failure and censoring times probability distributions, whether they
37

are in the treated group or in the control group. In other words:
P r(T
i
(Z
i
, D
i
(Z
i
))
|Z
i
= 0, U
i
= u)
= P r(T
i
(Z
i
, D
i
(Z
i
))
|Z
i
= 1, U
i
= u)
for u = a or n;
P r(C
i
(Z
i
, D
i
(Z
i
))
|Z
i
= 0, U
i
= u)
= P r(C
i
(Z
i
, D
i
(Z
i
))
|Z
i
= 1, U
i
= u)
for u = a or n.
(5.2)
For convenience, in the remainder of this thesis, we use T
i
(Z
i
) and C
i
(Z
i
) to
denote the failure and censoring times corresponding to the assigned treatment for
the ith individual, whose assigned treatment is Z
i
= z and whose actual received
treatment is D
i
(z) = d. Equation (5.2) can then be re-written as:
P r(T
i
(0)
|Z
i
= 0, U
i
= u)
= P r(T
i
(1)
|Z
i
= 1, U
i
= u)
for u = a or n;
P r(C
i
(0)
|Z
i
= 0, U
i
= u)
= P r(C
i
(1)
|Z
i
= 1, U
i
= u)
for u = a or n.
(5.3)
Consequently, the parameters associated with the outcome distributions (fail-
ure and censoring), which correspond to the subgroup with U
i
= n and Z
i
= 1 or
0, and to subgroup with U
i
= a and Z
i
= 1 or 0, are equivalent. Hence, we have
the following relationships:
T,n1
=
T,n0
T,a1
=
T,a0
C,n1
=
C,n0
C,a1
=
T,a0
.
(5.4)
Therefore we can use
T,n
to represent both
T,n1
and
T,n0
, as they are identical
under the assumption of exclusion restriction. Similarly, the two parameters
T,a1
and
T,a0
are identical under the same assumption. So we can use
T,a
to denote
38

both
T,a1
and
T,a0
. By following the same logic, we write
C,n
=
C,n1
=
C,n0
and
C,a
=
C,a1
=
C,a0
.
Equation (5.4) implies that the parameters
T,uz
and
C,uz
, which are associated
with the failure and censoring times for the subgroup with U
i
= u and assigned
treatment Z
i
= z, include the following components:
T,uz
= (
T,c1
,
T,c0
,
T,n
,
T,a
)
C,uz
= (
C,c1
,
C,c0
,
C,n
,
C,a
).
(5.5)
Note that no parameter for the subgroup with U
i
= d and Z
i
= 1 or 0 exists,
because under the assumption of Monotonicity, no defiers exist.
Thereby, because we have assumed monotonicity, up to two categories exist
for compliance type U
i
= u (knowing the value of Z
i
and D
i
(Z
i
)) (Table 3).
The parameter
u,zd
= P r(U
i
= u
|Z
i
= z, D
i
(z) = d), which is defined as the
conditional probability of compliance type U
i
given the assigned treatment Z
i
= z
and actual received treatment D
obs
= d, can now satisfy the following equations:
c,11
+
a,11
= 1 for u
{c, a} when Z
i
= 1 and D
i
(1) = 1,
c,00
+
n,00
= 1 for u
{c, n} when Z
i
= 0 and D
i
(0) = 0,
n,10
= 1
for u
{n}
when Z
i
= 1 and D
i
(1) = 0,
a,01
= 1,
for u
{a}
when Z
i
= 0 and D
i
(0) = 1.
(5.6)
The other
u,zd
for u
{c, n, a} and z, d = 1 or 0 have zero values.
Now the model parameters of interest
can then be specified as = (, ),
where
= (
T,c1
,
T,c0
,
T,n
,
T,a
,
C,c1
,
C,c0
,
C,n
,
C,a
),
= (
c,11
,
c,00
).
(5.7)
In addition, using
u
to denote the probability of compliance type U
i
= u, i.e.
39

u
= P r(U
i
= u), and by following the theorem of total probability, we have:
P r(U
i
= u) =
d
{1,0} z{1,0}
P r(U
i
= u, D
i
(z) = d, Z
i
= z)
=
d
{1,0} v{1,0}
P r(U
i
= u
|D
i
(z) = d, Z
i
= z)P r(D
i
(z) = d, Z
i
= z).
(5.8)
Recall that
u,zd
= P r(U
i
= u
|D
i
(z) = d, Z
i
= z). Denote
zd
= P r(D
i
(z) =
d, Z
i
= z) to correspond to the probability of individuals with values (Z
i
=
z, D
i
(z) = d)) in the population. Then equation (5.8) can be re-written as:
u
=
d
{1,0} z{1,0}
u,zd
zd
.
(5.9)
By following this definition,
zd
can then be estimated by
zd
=
N
zd
N
. Here
N
zd
is the size of the subgroup in a trial where all individuals have Z
i
= z and
D
i
(z) = d, and N is the total number of individuals involved in the study trial.
Furthermore, we have P r(U
i
= u
|Z
i
= 1) = P r(U
i
= u
|Z
i
= 0) as randomisation
guarantees a balance between the treated (Z
i
= 1) and the control (Z
i
= 0)
groups. Therefore we can write
u
= P r(U
i
= u
|Z
i
= 1) = P r(U
i
= u
|Z
i
= 0).
Since the number of subjects who do or do not actually receive the treatment
in the treatment group (Z
i
= 1) are observable (denoted by N
11
and N
10
respec-
tively), we can then estimate
n
by
n
=
N
10
N
1
. If always-takers exist in a trial,
then the number of subjects N
01
in the control group who receive the treatment
is known. Then we have
a
=
N
01
N
0
. Also
c
+
a
+
n
= 1 because no defiers exist.
Then the probability of compliers can be estimated by
c
= 1
-
a
-
n
(O'Malley
40

& Normand, 2005). These can be expressed as:
n
=
N
10
N
1
;
a
=
N
01
N
0
;
c
= 1
-
a
-
n
.
(5.10)
Recall that individuals from the subgroup with Z
i
= 1 and D
i
(1) = 1 can
be written as i
A(1, 1). Their compliance type can be compliers, U
i
= c, or
always-takers, U
i
= a. Similarly, individuals from the subgroup with Z
i
= 0
and D
i
(0) = 0, denoted by i
A(0, 0), may either be compliers, U
i
= c, or
never-takers, U
i
= n. Then, following the definition of
u,zd
and by Table 3, we
have:
c,11
=
c
c
+
a
,
c,00
=
c
c
+
n
.
(5.11)
By following equation (4.35) and considering the relationships given in equa-
tion (5.11), the likelihood function L
F
(
) with respect to , can be replaced by
a product of l
F
(
) and l
F
(
). That is:
L
F
(
) = l
F
(
)l
F
(
),
(5.12)
where
l
F
(
) =
i=1
F
i
u,zd
(
T,uz
,
C,uz
),
l
F
(
) =
i=1
u,zd
.
(5.13)
Therefore, to maximise L
F
(
) is equivalent to maximise l
F
(
) and l
F
(
), sepa-
rately.
Because of the relationships given equation (5.11), which implies that
c,11
and
41

c,00
can be calculated directly through the value of
u
for u
{c, n, a}, we then
write:
= (
c
,
n
,
a
).
(5.14)
Therefore, to estimate parameter
= (, ) is equivalent to estimating (,
).
As we stated before, the MLE of
can be estimated by equation (5.10). Hence
we need only to maximise l
F
(
) for obtaining the MLE of . The EM algorithm is
employed here because an unobservable variable, compliance type U
i
, is involved
in the likelihood function l
F
(
).
In other words, there is only the first part of parameter
, denoted by need
to be further estimated through the EM algorithm. That is:
= (
T,uz
C,uz
);
= (
T,c1
,
T,c0
,
T,n
,
T,a
,
C,c1
,
C,c0
,
C,n
,
C,a
).
(5.15)
The rest part of
, the parameter
can be regarded as a known parameter
when we estimate the parameter
. As a consequence, the parameter is also
known because of equation (5.11). For simplification of notation, in the remains
of this chapter we still use
rather than
to refer to the rest part of
.
6
EM algorithm
The EM algorithm is a generally applicable algorithm that provides an iterative
procedure for computing MLEs in such situations. Indeed, the EM algorithm is
a commonly used method of maximum likelihood to deal with incomplete data
(Laird & Louis, 1982; Baker, 1992; Meng & Rubin, 1991, 1993; Imbens & Rubin,
1997; O'Malley & Normand, 2005; Peng et al., 2004).
In general, the likelihood method requires the score equation to be solved.
42

Unfortunately, the likelihood function of
for incomplete data, as given in equa-
tion (4.37), cannot be formulated. This is because, at least for those individuals
assigned to the control group, their compliance type, U
i
, will never be observed.
The EM algorithm approaches this problem (of solving the incomplete data likeli-
hood score function) indirectly by proceeding iteratively in terms of the complete
data log likelihood function of
, denoted by log(L
F
(
)) (equation (5.12)).
6.1
The expectation step
According to Dempster et al. (1977), as introduced in Section 4.3.2, the first
step in the EM algorithm is to obtain the conditional expectation of log(L
F
(
)),
the Q-function Q(
,
k
). Now it is replaced by the conditional expectation of
log(l
F
(
)), the Q-function Q(,
k
). In other words:
Q(
,
k
) = E
u
{log(l
F
(
)|(T
i
, C
i
)
obs
, Z
i
= z, D
i
(z) = d), U
i
= u, ;
k
},
(6.1)
where u
{c, n, a} and l
F
(
) denotes the likelihood function with respect to
parameter
for the complete data that is given in equation (5.13). (T
i
, C
i
)
obs
indicates the observable part of survival outcomes (T
i
, C
i
), which is (T
obs
i
=
t, C
obs
i
> t) if the ith individual experienced the event at time t; or (T
obs
i
>
t, C
obs
i
= t) if the ith individual is censored at time t.
Note that equations (5.11) and (5.10) have shown that
is only determined by
the observed values of subgroup sizes, which are denoted by N
1
, N
10
, N
0
, and N
01
.
This implies that for a certain dataset, L
F
(
) would be a constant because in the
dataset N
1
, N
10
, N
0
, and N
01
are fixed. Hence, to maximise l
F
(
) is equivalent
to maximise L
F
(
|) = l
F
(
)l
F
(
) = Kl
F
(
) when is known. (K refers to
any constant here). So we give the likelihood function with respect to parameter
43

as l
F
(
) = L
F
(
|), which has the same structure as L
F
(
), the likelihood
function of
given in equation (4.36), but is viewed as a known vector. The
advantage of using this form of the likelihood function l
F
(
) = L
F
(
|) will be
seen in Chapter 7 when we create an extended PPO-survival model for a more
complicated situation in time-to-event study.
As compliance type, U
i
, can be treated as a discrete random variable with
c, a, n (three states), then, by the definition of expectation of discrete random
variables, equation (6.1) can be expressed as follows:
Q
i
(
,
k
)
=
u
{c,n,a}
P r(U
i
= u
|(T
i
, C
i
)
obs
, Z
i
= z, D
i
(z) = d;
k
)
{log(l(, u)}
(6.2)
where l(
, u) is another form of l
F
(
), the likelihood function with respect to
for complete data, we give its form as follows:
l
F
(
) = L(, u|),
(6.3)
because L
F
(
) = L(, u).
In addition, we can further assume that compliance type, U
i
, follows the
Bernoulli distribution, given observed data T
i
(z), C
i
(z), D
i
(z).
This is true
because of the assumptions of compound exclusion restriction and monotonicity,
given (Z
i
= z, D
i
(z) = d). Recall that compliance type has up to two compli-
ance categories for this development (refer to Table 3). More details about the
structure of the Q-function Q(
,
k
), the expectation function, are given Section
8.2.
44

6.2
The maximisation step
After obtaining the formulation of Q-function Q(
,
k
) given the current param-
eter values
k
, the second step of the EM algorithm is to find a new estimate,
k+1
, which maximizes the Q-function Q(
,
k
). This implies that the following
inequality is always satisfied:
Q(
k+1
,
k
)
Q(,
k
)
for all
,
(6.4)
where
is the parameter space of
.
Further information on the process of maximisation for the non-ignorable
PPO-survival model is given in Section 8.3.
7
Model specifications
For the purpose of showing how our non-ignorable models work, we focus mainly
on the Weibull model as the survival distribution. This is due to its special fea-
tures. Firstly, the Weibull distribution is flexible for many different patterns of
data. This can be seen from its varying types of hazard curve (Klein, 1997; Cox
& Oakes, 1984). Secondly, the Weibull distribution is associated with relatively
simple survival, hazards and probability density functions (p.19, Cox & Oakes
(1984)). This makes the Weibull a very popular parametric model in conventional
survival analysis. Furthermore, the Weibull distribution is the only distribution
which satisfies both the accelerated failure time (AFT) and the Cox proportional
hazard (PH) models (Cox & Oakes, 1984). The log-normal model is another plau-
sible parametric distribution used in conventional survival analysis. We also spec-
ify the log-normal distribution for the non-ignorable PPO-survival model (Section
7.1). In this thesis, our parametric potential-outcome (PPO) models are specified
45

in terms of the Weibull and the log-normal distributions. Other distributions are
the topic of future research.
7.1
Specific non-ignorable PPO-survival models
When accounting for a censoring time distribution, the non-ignorable survival
model can be specified in more than two forms. This is true, even when the
Weibull and the log-normal are the only pre-specified distributions. Note that
failure and censoring times may follow different types of distribution. For instance,
suppose that the failure time follows a Weibull distribution and its associated
censoring time has an underlying log-normal distribution. In this case, the non-
ignorable survival model has to be specified as a NIGN-WL model; namely a
non-ignorable survival Weibull log-normal model.
However, when failure and
censoring times follow the log-normal and the Weibull distribution, respectively,
we have the NIGN-LW model. In this thesis, we focus, however, on specific non-
ignorable survival models with the same distributions for both the failure and
censoring times. These are clearly denoted as the NIGN-WW model and the
NIGN-LL model. Table 4 gives the nomenclature for the non-ignorable survival
model specifications.
Table 4: Specification of the non-ignorable PPO-survival models.
Distribution
non-ignorable survival model
Weibull
NIGN-WW model
log-normal
NIGN-LL model
46

7.2
Parameter of interest
in the specific models
The parameter
= (
T,uz
,
C,uz
) is associated with failure and censoring time
distributions, respectively, corresponding to four subgroups each. In total,
in-
cludes 8
× 2 = 16 components. This is true, as both a Weibull and a log-normal
distribution include two model parameters: the scale parameter and the shape
parameter for the Weibull; the mean and the variance of the logarithms of
survival outcome for the log-normal.
7.2.1
Parameter
in the NIGN-WW model
Suppose failure time T
0i
, corresponding to the baseline, follows a Weibull distribu-
tion with parameter
0
T
and
0
T
. By following Theorem 1 (proved in Appendix 10),
the parameter of most interest,
T
=
T,uz
, which is associated with the distribu-
tions of failure time and is specified in equation (5.15) (the first four parameters
corresponding to 4 possible subgroups), has the following relationships with the
parameters associated with the standard distribution for the baseline failure time,
0
T
and
0
T
, and the regression coefficients in AFT model, which are:
T,c1
= (
T,c1
,
T,c1
);
where
T,c1
=
0
T
exp(
(c)
+
(c)
) and
T,c1
=
0
T
,
T,c0
= (
T,c0
,
T,c0
);
where
T,c0
=
0
T
exp(
(c)
) and
T,c0
=
0
T
,
T,n
= (
T,n
,
T,n
);
where
T,n
=
0
T
and
T,n
=
0
T
,
T,a
= (
T,a
,
T,a
);
where
T,a
=
0
T
exp(
(a)
) and
T,c0
=
0
T
.
(7.1)
Similarly, we suppose that the censoring time C
0
, corresponding to the base-
line, also follows a Weibull distribution with parameters
0
C
and
0
C
. In this case,
47

C
=
C,uz
) can be specified as follows:
C,c1
= (
C,c1
,
C,c1
);
where
C,c1
=
0
C
exp(
(c)
+
(c)
) and
C,c1
=
0
C
,
C,c0
= (
C,c0
,
C,c0
);
where
C,c0
=
0
C
exp(
(c)
) and
C,c0
=
0
C
,
C,n
= (
C,n
,
C,n
);
where
C,n
=
0
C
and
C,n
=
0
C
,
C,a
= (
c,a
,
C,a
);
where
C,a
=
0
C
exp(
(a)
) and
C,c0
=
0
C
.
(7.2)
Consequently, the parameter
given in equations (5.15) for the NIGN-WW
model can be written as follows:
= (
T,uz
,
C,uz
),
= (
T,c1
,
T,c0
,
T,n
,
T,a
,
T
,
C,c1
,
C,c0
,
C,n
,
C,a
,
C
).
(7.3)
Here
T
=
0
T
and
C
=
0
C
denote the shape parameters of the Weibull models
associated with the failure and censoring times (which correspond to the baseline),
respectively. The number of components involved in is now reduced from 16
to 10. Note that equations (7.1) and (7.2) show that these scale parameters
(corresponding to the four subgroups), namely
T,c1
,
T,c0
,
T,n
,
T,a
, have been
linked to
0
T
( the scale parameter in the Weibull distribution, associated with
the failure time for the baseline), through
(c)
,
(c)
,
(a)
, the regression coefficients
in the AFT model. However, we use the scale parameters associated with the
four subgroups, rather than the AFT regression coefficients because at this stage,
the former clearly describe model parameters in which each subgroup (either for
failure time or censoring time) has its own scale parameter and these subgroups
share the same shape parameter. Recall that these four subgroups are as follows:
Subgroup 1 and Subgroup 2 consist of compliers who are all assigned to the treated
group (U
i
= c and Z
i
= 1) or the control group (U
i
= c and Z
i
= 0), respectively;
48

Subgroup 3 includes never-takers (U
i
= a) only; and Subgroup 4 includes always-
takers (U
i
= a).
7.2.2
Parameter
in the NIGN-LL model
As in the NIGN-WW model, the parameter
given in equation (5.15) for the
NIGN-LL model can then be specified as follows:
= (
T,uz
,
C,uz
),
= (
T,c1
,
T,c0
,
T,n
,
T,a
,
T
,
C,c1
,
C,c0
,
C,n
,
C,a
,
C
).
(7.4)
The parameters
T
and
C
, given in equations (7.4), denote the variance pa-
rameters of the logarithm of the failure time log(T
0i
) (
(2)
T
) and the logarithm
of the censoring times log(C
0i
) (
(2)
C
) associated with the baseline
T
=
0
T
and
C
=
0
C
By following Theorem 1,
given in Appendix 10, the parameters associated with the failure time distribution
satisfy the following equations:
T,c1
= (
T,c1
,
T,c1
)
where
T,c1
=
0
T
+
(c)
+
(c)
and
T,c1
=
0
T
,
T,c0
= (
T,c0
,
T,c0
)
where
T,c0
=
0
T
+
(c)
and
T,c0
=
0
T
,
T,n
= (
T,n
,
T,n
)
where
T,n
=
0
T
and
T,n
=
0
T
,
T,a
= (
T,a
,
T,a
)
where
T,a
=
0
T
+
(a)
and
T,c0
=
0
T
.
(7.5)
Parameters associated with the censoring time distribution have the following
49

relationships:
C,c1
= (
C,c1
,
C,c1
)
where
C,c1
=
C
0
+
(c)
+
(c)
and
C,c1
=
0
C
,
C,c0
= (
C,c0
,
C,c0
)
where
C,c0
=
C
0
+
(c)
and
C,c0
=
0
C
,
C,n
= (
v,n
,
C,n
)
where
C,n
=
0
C
and
C,n
=
0
C
,
C,a
= (
C,a
,
C,a
)
where
C,a
=
0
C
+
(a)
and
C,c0
=
0
C
.
(7.6)
Table 5 shows the parameters involved in the NIGN-WW and the NIGN-LL
models.
Table 5: Non-ignorable PPO-survival model parameter vectors.
Model name
model parameter vector
NIGN-WW
T,c1
,
T,c0
,
T,n
,
T,a
,
T
,
C,c1
,
C,c0
,
C,n
,
C,a
,
C
NIGN-LL
T,c1
,
T,c0
,
T,n
,
T,a
,
T
,
C,c1
,
C,c0
,
C,n
,
C,a
,
C
8
Computational issues
Since the compliance type variable, U
i
= u, is unobservable, those general MLE
methods are not easily implemented for the PPO-survival model. Alternatively,
as mentioned before (Section 6), the EM algorithm (Dempster et al., 1977) can be
employed for our developments. Since the posterior probabilities of compliance
type U
i
, P r(U
i
= u
|(T
i
, C
i
)
obs
, Z
i
= z, D
i
(z) = d;
k
), have been involved in the
Q-function, Q(
,
k
), (equation (6.2)), we need to formulate it first before giving
the specified Q-function Q(
,
k
).
50

8.1
Maximization of the likelihood function using the EM
algorithm
8.1.1
Posterior probabilities of compliance type U
i
For convenience of expression, we use P
i
ac
, P
i
nc
, P
i
n
and P
i
a
to denote the related
posterior probabilities for non-censored R
i
= 1 associated with the following cases:
(1) Z
i
= 1, D
i
(1) = 1; (2) Z
i
= 0, D
i
(0) = 0; (3) Z
i
= 1, D
i
(1) = 0; and (4)
Z
i
= 0, D
i
(0) = 1. Note that in the first two cases given above, compliance type
U
i
can possible be two categories, say compliers (U
i
= c) or always-takers (U
i
= a)
for case (1) and compliers (U
i
= c) or never-takers (U
i
= n) for case (2); while in
case (3), compliance type U
i
can only be never-takers (U
i
= n); and in case (4),
compliance type U
i
can only be always-takers (U
i
= a) (Table 3). This is because
of the monotonicity assumption, which excludes defiers.
So we define P
i
ac
, P
i
nc
, P
i
n
and P
i
a
as follows:
P
i
ac
= P r(U
i
= c
|(T
i
(1), C
i
(1))
obs
, Z
i
= 1, D
i
(1) = 1;
k
)
for R
i
= 1 ,
P
i
nc
= P r(U
i
= c
|(T
i
(0), C
i
(0))
obs
, Z
i
= 0, D
i
(0) = 0;
k
)
for R
i
= 1 ,
P
i
n
= P r(U
i
= n
|(T
i
(1), C
i
(1))
obs
, Z
i
= 1, D
i
(1) = 0;
k
)
for R
i
= 1 ,
P
i
a
= P r(U
i
= a
|(T
i
(0), C
i
(0))
obs
, Z
i
= 0, D
i
(0) = 1;
k
)
for R
i
= 1 .
(8.1)
Similarly, we use Q
i
ac
, Q
i
nc
, Q
i
n
and Q
i
a
to denote the related posterior probabilities
for censored R
i
= 0 associated with the aforementioned four cases, they are defined
as the following:
Q
i
ac
= P r(U
i
= c
|(T
i
(1), C
i
(1))
obs
, Z
i
= 1, D
i
(1) = 1;
k
)
for R
i
= 0 ,
Q
i
nc
= P r(U
i
= c
|(T
i
(0), C
i
(0))
obs
, Z
i
= 0, D
i
(0) = 0;
k
)
for R
i
= 0,
Q
i
n
= P r(U
i
= n
|(T
i
(1), C
i
(1))
obs
, Z
i
= 1, D
i
(1) = 0;
k
)
for R
i
= 0,
Q
i
a
= P r(U
i
= a
|(T
i
(0), C
i
(0))
obs
, Z
i
= 0, D
i
(0) = 1;
k
)
for R
i
= 0 .
(8.2)
51

Note that when the event occurs (non-censored and denoted by R
i
= 1), the
observed data includes: failure time T
i
(z), censoring time (C
i
(z) > T
i
(z)), assigned
treatment Z
i
= z and actual received treatment D
i
(z), and when an individual is
censored (denoted by R
i
= 0), the observed data includes: censoring time C
i
(z),
censored failure time T
i
(z) > C
i
(z), assigned treatment Z
i
and actual received
treatment D
i
(z).
The formulations for the posterior probabilities corresponding to the last two
cases, namely Z
i
= 1, D
i
(1) = 0 ; and Z
i
= 0, D
i
(0) = 1, are as follows:
P r(U
i
= n
|(T
i
(1), C
i
(1))
obs
, Z
i
= 1, D
i
= 0;
k
) = 1;
for R
i
= 1;
P r(U
i
= a
|(T
i
(0), C
i
(0))
obs
, Z
i
= 0, D
i
= 1;
k
) = 1;
for R
i
= 1;
P r(U
i
= n
|(T
i
(1), C
i
(1))
obs
, Z
i
= 1, D
i
= 0;
k
) = 1;
for R
i
= 0;
P r(U
i
= a
|(T
i
(0), C
i
(0))
obs
, Z
i
= 0, D
i
= 1;
k
) = 1;
for R
i
= 0.
(8.3)
This is true, because no defiers exist and by the definition of compliance type
U
i
, we know that an individual is a never-taker if he/she is assigned to the treat-
ment group Z
i
= 1 but does not receive the treatment D
i
(1) = 0 or is an always-
taker if the individual is assigned to the control group Z
i
= 0 but for some reason
does receive the treatment.
Equation (8.3) gives the following equivalences:
P
i
n
= 1
for i
C(1, 0, 1),
Q
i
n
= 1
for i
C(1, 0, 0),
P
i
a
= 1
for i
C(0, 1, 1),
Q
i
a
= 1
for i
C(0, 1, 0).
(8.4)
where
C(z, d, r) denotes the subset of individuals, which is defined by the following:
C(z, d, r) = {i : Z
i
= z, D
i
= d, R
i
= r
} for all z, d, r {1, 0}. Now, we
52

formulate the posterior probability of compliance type U
i
for the first two cases,
i.e. Z
i
= 1 and D
i
(1) = 1, and Z
i
= 0 and D
i
(0) = 0. Since both failure and
censoring times are involved in the non-ignorable PPO-survival model, so too
are their posterior probabilities. As per Bayes' theorem, these can be written as
the formulae given in equations (8.5) and (8.6) (see next page). For the non-
ignorable PPO-survival model, P
i
ac
, P
i
nc
, Q
i
ac
and Q
i
nc
can then be written as the
equivalences shown in equation (8.7) on page 55.
53

Pr
(U
i
=
u|
(T
i
(1)
,C
i
(1))
obs
,Z
i
=1
,D
i
(1)
=
1
;
k
)
=
f
u
(1
,t
;
T,
u
1
)G
u
(1
,t
;
C,
u
1
)
u,
11
f
c
(1
,t
;
T,
c
1
)G
c
(1
,t
;
C,
c
1
)
c,
11
+
f
a
(1
,t
;
T,
a
1
)G
a
(1
,t
;
C,
a
1
)
a,
11
,if
R
i
=1
,T
i
(1)
=
t;
Pr
(U
i
=
u|
(T
i
(1)
,C
i
(1))
obs
Z
i
=1
,D
i
(1)
=
1
;
k
)
=
S
u
(1
,t
;
T,
u
1
)e
u
(1
,t
;
C,
u
1
)
u,
11
S
c
(1
,t
;
T,
c
1
)e
c
(1
,t
;
C,
c
1
)
c,
11
+
S
a
(1
,t
;
T,
a
1
)e
a
(1
,t
;
C,
a
1
)
a,
11
,if
R
i
=0
,C
i
(1)
=
t;
(8.5)
for
u
{
c,
a}
;
and
Pr
(U
i
=
u|
(T
i
(0)
,C
i
(0))
obs
,Z
i
=0
,D
i
(0)
=
0
;
k
)
=
f
u
(0
,t
;
T,
u
0
)G
u
(1
,t
;
C,
u
0
)
u,
00
f
c
(0
,t
;
C,
c
0
)G
c
(0
,t
;
C,
c
0
)
c,
00
+
f
n
(0
,t
;
T,
n
0
)G
n
(0
,t
;
C,
n
0
)
n,
00
,
if
R
i
=1
,
T
i
(0)
=
t;
Pr
(U
i
=
u|
(T
i
(0)
,C
i
(0))
obs
,Z
i
=0
,D
i
(0)
=
0
;
k
)
=
S
u
(0
,t
;
T,
u
0
)e
u
(0
,t
;
C,
u
0
)
u,
00
S
c
(0
,t
;
T,
c
0
)e
c
(0
,t
;
C,
c
0
)
c,
00
+
S
n
(0
,t
;
T,
n
0
)e
n
(0
,t
;
C,
n
0
)
n,
00
,
if
R
i
=0
,C
i
(0)
=
t;
(8.6)
for
u
{
c,
n}
.
54

P
i
ac
=
f
c
(1
,t
;
T,
c
1
)G
c
(1
,t
;
C,
c
1
)
c,
11
f
c
(1
,t
;
T,
c
1
)G
c
(1
,t
;
C,
c
1
)
c,
11
+
f
a
(1
,t
;
T,
a
1
)G
a
(1
,t
;
C,
a
1
)
a,
11
;
for
iC
(1
,1
,1)
P
i
nc
=
f
c
(0
,t
;
T,
c
0
)G
c
(1
,t
;
C,
c
0
)
c,
00
f
c
(0
,t
;
T,
c
0
)G
c
(0
,t
;
C,
c
0
)
c,
00
+
f
n
0
(0
,t
;
T,
n
0
)G
n
0
(0
,t
;
C,
n
0
)
n,
00
;
for
iC
(0
,0
,1)
Q
i
ac
=
S
c
(1
,t
;
T,
c
1
)e
c
(1
,t
;
C,
c
1
)
c,
11
S
c
(1
,t
;
T,
c
1
)e
c
(1
,t
;
C,
c
1
)
c,
11
+
S
a
(1
,t
;
T,
a
1
)e
a
(1
,t
;
C,
a
1
)
a,
11
;
for
iC
(1
,1
,0)
Q
i
nc
=
S
c
(0
,t
;
T,
c
0
)e
c
(0
,t
;
C,
c
0
)
c,
00
S
c
(0
,t
;
T,
c
0
)e
c
(0
,t
;
C,
c
0
)
c,
00
+
S
n
(0
,t
;
T,
n
0
)e
n
(0
,t
;
C,
n
0
)
n,
00
;
for
iC
(0
,0
,0)
.
(8.7)
55

For convenience, we use W
i
(u,
k
) to denote the posterior probability of com-
pliance type P r(U
i
= u
|(T
i
, C
i
)
obs
, Z
i
= z, D
i
(z) = d,
k
) given the observed data
(and parameter values) for the non-ignorable PPO-survival model. That is:
W
i
(u,
k
) = P r(U
i
= u
|(T
i
, C
i
)
obs
, Z
i
= z, D
i
(z) = d;
k
),
(8.8)
where (T
i
, C
i
)
obs
is same as given in equation (6.1), and indicates the observable
part of survival outcomes (T
i
, C
i
). (T
i
, C
i
)
obs
is equivalent to T
obs
i
= T
i
(z) given
Z
i
= z, when R
i
= 1; (T
i
, C
i
)
obs
is equivalent to C
obs
i
= C
i
(z) given Z
i
= z, when
R
i
= 0; and
k
represents the given value of the relevant model parameters, the
components of which are:
(
T,c1
,
T,c0
,
T,n
,
T,a
, ,
C,c1
,
C,c0
,
C,n
,
C,a
,
C
)
(8.9)
for the NIGN-WW model and
(
T,c1
,
T,c0
,
T,n
,
T,a
, ,
C,c1
,
C,c0
,
C,n
,
C,a
,
c
)
(8.10)
for the NIGN-LL model.
The posterior probability W
i
(u,
k
) of the ith individual can then be written
56

as follows:
W
i
(u,
k
) =
(P
i
ac
)
I
{U
i
=c}
(1
- P
i
ac
);
I
{U
i
=a}
for i
C(1, 1, 1),
(P
i
nc
)
I
{U
i
=c}
(1
- P
i
nc
);
I
{U
i
=n}
for i
C(0, 0, 1),
(Q
i
ac
)
I
{U
i
=c}
(1
- Q
i
ac
);
I
{U
i
=a}
for i
C(1, 1, 0),
(Q
i
nc
)
I
{U
i
=c}
(1
- Q
i
nc
);
I
{U
i
=n}
for i
C(0, 0, 0),
P
i
n
= 1;
for i
C(1, 0, 1),
Q
i
n
= 1;
for i
C(1, 0, 0),
P
i
a
= 1;
for i
C(0, 1, 1),
Q
i
a
= 1;
for i
C(0, 1, 0);
(8.11)
where I
{} is an indicator function, and P
i
ac
, P
i
nc
, Q
i
ac
and Q
i
nc
denote the pos-
terior probabilities for the specific cases, given by equation (8.7) for the non-
ignorable PPO-survival model. Recall that
C(z, d, r) denotes the subset of indi-
viduals
C(z, d, r) = {i : Z
i
= z, D
i
= d, R
i
= r
} for all z, d, r {1, 0}.
8.2
Structures of the expectation function Q(
,
k
)
After defining the posterior probability, W
i
(u,
k
), which is given in equation
(8.11), the expectation equation, Q-function Q(
,
k
) specified in equation (6.2)
can then be re-written as:
Q
i
(
,
k
) =
u
{c,n,a}
W
i
(u,
k
)
{log(l
F
(
)}
=
u
{c,n,a}
W
i
(u,
k
)
{log(l(, u)},
(8.12)
where l
F
(
) is given in equation (4.36).
57

After substituting equations (8.7), (8.11) and (4.36) into equation (8.12),
Q(
,
k
) can be specified as in equation (8.13) overleaf.
58

Q
(
,
k
)=
i
Q
i
(
,
k
)=
iC
(1
,1
,1)
{
lo
g
(f
i
c
(1
,t
;
T,
c
1
)G
i
c
(1
,t
;
C,
c
1
)
c,
11
)
P
i
ac
+
lo
g
(f
i
a
(1
,t
;
T,
a
)G
i
a
(1
,t
;
C,
a
)
a,
11
)(
1
-
P
i
ac
)}
iC
(1
,1
,0)
{
lo
g
(S
i
c
(1
,t
;
T,
c
1
)e
i
c
(1
,t
;
C,
c
1
)
c,
11
)
Q
i
ac
+
lo
g
(S
i
a
(1
,t
;
T,
a
)e
i
a
(1
,t
;
C,
a
)
a,
11
)(
1
-
Q
i
ac
)}
iC
(0
,0
,1)
{
lo
g
(f
i
c
(0
,t
;
T,
c
0
)G
i
c
(0
,t
;
C,
c
0
)
c,
00
)
P
i
nc
+
lo
g
(f
i
n
(0
,t
;
T,
n
)G
i
n
(0
,t
;
C,
n
)
n,
00
)(
1
-
P
i
nc
)}
iC
(0
,0
,0)
{
lo
g
(S
i
c
(0
,t
;
T,
c
0
)e
i
c
(0
,t
;
C,
c
0
)
c,
00
)
Q
i
nc
+
lo
g
(S
i
n
(0
,t
;
T,
n
)e
i
n
(0
,t
;
C,
n
)
n,
00
)(
1
-
Q
i
nc
)}
iC
(1
,0
,1)
lo
g
(f
i
n
(1
,t
;
T,
n
)G
i
n
(1
,t
;
C,
n
)
n,
10
)
iC
(1
,0
,0)
lo
g
(S
i
n
(1
,t
;
T,
n
)e
i
n
(1
,t
;
C,
n
)
n,
10
)
iC
(0
,1
,1)
lo
g
(f
i
a
(0
,t
;
T,
a
)G
i
a
(0
,t
;
C,
a
)
a,
01
)
iC
(0
,1
,0)
lo
g
(S
i
a
(0
,t
;
T,
a
)e
i
a
(0
,t
;
C,
a
)
a,
01
)
.
(8.13)
59

8.3
Maximization of the expectation function Q(
,
k
)
As stated in Section 6, the M-step aims to maximise the expectation function
given in equation (8.13). To realise this goal, the score equation (McLachlan &
Krishnan, 1997; Casella & Berger, 2002) is often used in the M-step. Specifically,
for the ith individual, we take the first partial derivative of Q
i
(,
k
), with respect
to the parameter in equation (8.13) for the non-ignorable PPO model. Then
the i
th
score equation is:
S
obs
i
(
) =
Q
i
(
,
k
) =
u
{c,n,a}
W
i
(u,
k
)
logl
i
(
, u).
(8.14)
As before, let S
i,
(u,
) denote
logl
i
(, u). Then the score equation for the
i
th
individual can be expressed as:
S
obs
i
(
) =
u
W
i
(u,
k
)S
i,
(u,
).
(8.15)
All parameters can then be estimated simultaneously by solving:
n
i
S
obs
i
(
) = 0.
(8.16)
Let
(k+1)
denote the solution of equation (8.16). We then substitute
(k)
in
Q(
,
(k)
) with
(k+1)
and repeat the M-step until convergence.
The models proposed in this thesis use a built-in function in MATLAB (7.4.0,
R2007a) (Higham & Higham, 2000). This function, fminsearch, is based on
the Nelder-Mead method (Lagarias et al., 1998) for multidimensional uncon-
strained minimisation.
We use it to minimise the negative expectation func-
tion
-Q(,
(k)
), which is equivalent to maximising the expectation function
Q(
,
(k)
).
60

8.4
Estimation of the standard errors
8.4.1
The standard error for the estimated model parameter
The EM algorithm relies only on the convergence of iterative procedures, in which
no gradient matrices are derived. As such, the EM algorithm does not provide the
information matrix required for traditional calculation of the asymptotic variance-
covariance matrix of the MLEs, from either the expected information matrix or
the observed information matrix (OIM). Consequently, no standard errors of the
MLEs can be obtained, even after possible adaptation of the EM algorithm Louis
(1982), however, proposed an approach for calculating the observed information
matrix (OIM) when the EM algorithm is used to find MLEs for so-called incom-
plete data problems. Louis's approach (Louis, 1982) only requires computation
of the complete-data gradient vector or computation of the second derivative ma-
trix, (which is not associated with the incomplete-data likelihood function). After
obtaining the OIM, the covariance matrix can then be calculated easily. Only
the inverse of the OIM is then required. The standard error of the MLEs vectors
can thus be obtained easily, in that the standard error is the square root of the
diagonals of the covariance matrix, where the latter is the inverse of the observed
information matrix (OIM).
Louis's method
1
Let Data
f ull
and Data
obs
denote the complete (full) and incomplete (observed)
data, respectively. The former includes the observed time (T, C)
obs
, (which is
equivalent to (Y (z), R(z)) in survival analysis, where Y (z) = min(T (z), C(z))
and R(z) is the censoring indicator), assigned treatment Z = z, received treat-
1
Equation (3.2 ) in Louis (1982) is strictly not correct. We believe that the rule of a non-
exchangeable vector order in the product was ignored. Details of our correction is given in
Appendix 10.
61

ment D(z) and the compliance type U . Thereby Data
f ull
= ((T, C)
obs
, Z =
z, D(z), U ). As compliance type U is unobservable, we have Data
obs
= ((T, C)
obs
, Z =
z, D(z)). By following equation (3.3) in Louis (1982), the observed information
matrix (OIM), I
obs
= I(^
), which can be viewed as the matrix at the MLEs,
where matrix I can be expressed in two parts: the conditional expected full data
OIM, I
f ull
, and the expected information for the conditional distribution of the
complete data I
f ull
|obs
. This is represented as follows:
I
obs
= I(^
) = I
f ull
(^
) - I
f ull
|obs
(^
).
(8.17)
Under the independent and identically distributed (i.i.d) assumption, these
two matrices I
f ull
and I
f ull
|obs
can be specified as:
I
f ull
= E
u
{-
n
i=1
2
2
logL
i
(
, u)|(T
i
, C
i
)
obs
, Z
i
= z, D
i
(z) = d, U
i
= u
} (8.18)
I
f ull
|obs
=
i
E
u
logl
F
i
(
)
T
logl
F
i
(
) |(T
i
, C
i
)
obs
, Z
i
= z, D
i
(z) = d, U
i
= u
+
i=j
E
u
logl
F
i
(
)|(T
i
, C
i
)
obs
, Z
i
= z, D
i
(z) = d, U
i
= u
T
×
E
u
logl
F
j
(
)|(T
i
, C
i
)
obs
, Z
i
= z, D
i
(z) = d, U
i
= u
(8.19)
where
T
denotes transpose. Consequently, the observed information matrix I
obs
62

can be expressed
2
, after correcting the classical method of Louis (1982) as:
I
obs
= I(^
)
=
n
i=1
E
u
-
2
2
logl
i
(
, u)|(T
i
, C
i
)
obs
, Z
i
= z, D
i
(z) = d
|=^
-
i
E
u
logl
i
(
, u)
T
logl
i
(, u)
|(T
i
, C
i
)
obs
, Z
i
= z, D
i
(z) = d
|=^
-
i=j
E
u
logl
i
(, u)
|(T
i
, C
i
)
obs
, Z
i
= z, D
i
(z) = d
T
×
E
u
logl
j
(, u)
|(T
j
, C
j
)
obs
, Z
j
= z, D
j
(z) = d
|=^
.
(8.20)
Equation (8.20) (also see equation (D-28) in Appendix10) guarantees that the OIM
is symmetric. (Details are given on page 63.) As the inverse of I
obs
is the asymptotic
variance-covariance matrix of the MLEs, ^
, this can be denoted as V(^) = I
-1
(^
). Its
diagonal vector is the variance vector associated with each of the components of ^
and
is denoted by SE
^
. The standard error of ^
can then be calculated directly by taking
the square root of
2
^
, where
2
^
=
{V(^)}
ii
(i = 1, 2,
· · · , p), for p parameters.
8.4.2
The observed information matrix (OIM)
Note that the first term in equation (8.17) is equivalent to the second order partial
derivative of Q-function Q(
,
k
), which can be written as
2
Q(
,
(k)
)
2
and
is a interest
parameter vector with length p. If we write
= (
1
,
2
, . . . ,
p
), the element at the ith
row and the jth column in the matrix
2
Q(
,
(k)
)
2
, the second order derivative matrix
of Q-function Q(
,
(k)
) with respect to the ith and jth components of the interest
parameter
,
i
and
j
, can be written as follows:
2
Q(
,
(k)
)
2
=
2
Q(
,
(k)
)
i
j
ij
for i and j from 1 to p.
(8.21)
Since in the PPO-survival model, the interest parameter vector
, given in equation
2
This equation can be regarded as our correction of equation (3.2') of Louis (1982). Louis's
subscripts should specify that i = j rather than i < j.
63

(5.15), consists of the parameters associated with the distributions of failure and cen-
soring times for three subgroups. This leads to the fact of that majority of elements in
the matrix
2
Q(
,
(k)
)
2
are zeros. For instance, the second order partial derivative of the
Q-function,Q(
,
(k)
),
2
Q(
,
(k)
)
2
with respect to two scale parameters
1
=
T,c1
and
2
=
T,c0
(in Weibull PPO-survival model) is zero. That is:
2
Q(
,
(k)
)
i
j
=
2
Q(
,
(k)
)
1
2
= 0
(8.22)
As no an explicit relationship exists between
1
and
2
, which represent the scale pa-
rameter from two different subgroups here.
Even though in the same subgroup, however, with respect to the parameters asso-
ciated with the distributions of failure and censoring times,
2
Q(
,
(k)
)
i
j
can also be zero.
For example, let i = 1 and j = 5, in Weibull PPO-survival model we have
1
=
T,c1
and
5
=
C,c1
, where
2
Q(
,
(k)
)
1
5
= 0 always holds. This is because there is not an explicit
relationship between these two parameters
T,c1
and
C,c1
which are from two different
distributions. Recall that
T,c1
denotes the scale parameter of Weibull for failure time
in and
C,c1
denotes the scale parameter of Weibull for censoring time in subgroup1).
Therefore, in the PPO-survival model (including both Weibull and Log-normal
form), the first term of equation (8.20) can be simplified directly without knowing
the estimated value of parameter ^
. The simplified form of the matrix is given in Table
6 (page 65) where
i
and
j
denote any components of parameter
, for the specific
model, say the NIGN-WW or the NIGN-LL models,
is defined in Table 5 (on page
50). Note that Table 6 shows that in the matrix, [
2
Q(
,
(k)
)
i
j
]
10×10
, only few of elements
are not zero (denoted by
in the table) which can be calculated directly through the
standard likelihood function. In this thesis, the standard likelihood function were given
as either Weibull or log-normal form.
From Table 6, it is easy to see that [
2
Q(
,
(k)
)
i
j
]
10×10
, the second order partial deriva-
tive matrix of Q-function Q(
,
(k)
) is a symmetric matrix. In fact, the second term in
64

Table 6: The patten of second order partial derivative matrix of Q(
,
(k)
).
2
Q(
,
(k)
)
i
j
1
2
3
4
5
6
7
8
9
10
1
0
0
0
0
0
0
0
0
2
0
0
0
0
0
0
0
0
3
0
0
0
0
0
0
0
0
4
0
0
0
0
0
0
0
0
5
0
0
0
0
0
6
0
0
0
0
0
0
0
0
7
0
0
0
0
0
0
0
0
8
0
0
0
0
0
0
0
0
9
0
0
0
0
0
0
0
0
10
0
0
0
0
0
All these components of
are defined in Table 5 on page 50.
equation (8.20) also denotes a symmetric matrix as it is a summation of a product of
a vector and it's transpose; the third term in the same equation represents a symmet-
ric too, even though two different vectors were involved in the product term. This is
because that the later summation is taken over on i = j only, which means that those
single matrix that satisfied i = j was not included in the summation. Therefore, the
OIM, defined by equation (8.20) is really a symmetric matrix as their three components
are all symmetric matrices.
9
Application
This section gives demonstrations of using PPO-survival models via the NIGN-WW and
the NIGN-LL models which were used to re-analyse the HIP breast cancer data. Recall
that the background of this data was introduced in Section 2.
9.1
Pre-processing of the HIP breast cancer data
In the re-analysis of the HIP breast cancer data, two types of data subsets were used.
65

The original HIP data (Shapiro et al., 1988; Baker, 1998) includes 60,696 observa-
tions, in which the enrolment date and the last follow-up date for all individuals had
been recorded, as well as their survival status. The latter indicates whether an individ-
ual died of breast cancer during the observation period or not. Note that in the HIP
breast cancer trial, there were two examiners independently took the responsibility of
determining the course of each single case of death. For those individuals who died but
not due to breast cancer, they were treated as censored individuals in the trial (Shapiro
et al., 1988).
The observed time (from entry to death or last follow-up) was given in days in the
original HIP data. In this thesis, the survival status of an individual is the so-called
censoring indicator (denoted by R = 1 for death from breast cancer, 0 otherwise) and
observed times are counted in years, as in Baker (1998), which was obtained by the
following formula:
observed time (in years) =
days for entry to death or last follow-up
365
.
(9.1)
We note that 164 individuals in the original HIP data have the same date for their
enrolment and their last follow-up date. This implies that these individuals did not, in
fact, take part in the HIP study and thus should not be counted as "valid " individuals
in the HIP trial. Therefore, in this thesis, we focus on analyzing the subset of the HIP
data (60,696 - 160 = 60,532 women) rather than the original HIP data.
The HIP
18
dataset:
The first data subset studied follows Shapiro's report on the
HIP study (Shapiro et al., 1988), wherein the maximum observed time is fixed at 18
years. This is because Shapiro et al. (1988) purports that in the HIP study, the maxi-
mum follow-up period is 18 years. Thus, women whose observed times are greater than
18 years (in the original HIP data set) are all preset at 18. Their censoring indicators
are given as 0, as they can be viewed as censored at year 18 (after entry).
66

The HIP
7
dataset:
The second data subset is obtained by following the arguments
of Baker (1998). Baker mentions that:
"by seven years, the number of breast can-
cers in the control and screen groups have equalized, so presumably they represent all
breast cancers that could have benefitted from screening. " To reduce the variance of
estimators, Baker (1998) focused only on analyzing those individuals with breast cancer
diagnosed in their first seven years of follow-up. Baker also points out that the choice of
seven years (for the HIP study) was conservative; other authors have used earlier times
of so-called equalisation (Baker, 1998).
As with Baker (1998), we also chose to analyse the HIP data with seven years as
the maximum follow-up period. The benefit from the screening examination should
therefore be considered
"realised "
in seven rather than in eighteen years (Baker,
1998). Note that, as we are interested in investigating how the screening examination
impacts the longevity of women who are in general good health (rather than breast
cancer patients). Hence, unlike Baker (1998), in this thesis, we focus on analysing all
women in the HIP trial rather than those who had been diagnosed with breast cancer
during the observation period.
After truncating the observed time at 7 years, individuals whose observed times
were greater than seven years have been given 0 as their censored indicators value,
which shows that these individuals had not died at that time (7 years). Therefore, in
this HIP dataset, denoted by HIP
7
, the maximum observed time is 7 years; individuals
whose observed time are greater than 7 years (in the original HIP data) have R
i
= 0.
This indicates that individuals whose observed time has been truncated (in the HIP
7
data) did not die from breast cancer during the the follow-up period (7 years after
entry). Summary statistics for the three HIP datasets (the original HIP, HIP
18
and
HIP
7
) are given in Table 7 (on page 69).
As stated before, we focused on analysing the two aforementioned HIP data subsets,
namely HIP
18
and HIP
7
, with our NIGN-WW and NIGN-LL survival models. The
results are given in Section 9.2. Furthermore, for the purpose of comparison, these two
67

HIP data subsets are also re-analysed by the Frangakis-Rubin (F-R) model (Frangakis
& Rubin, 1999). To date, the F-R model has not yet been applied to any time-to-event
data in the literature. The results of the re-analysis of the HIP data subsets are given
in Section 9.3.
68

T
able
7:
Summary
statistics
of
HIP
data
sets
Original
HIP
data
T
otal
Screen
in
vited
Con
trol
Screen
receiv
ed
Screen
refused
No
Screen
receiv
ed
Z
=1
Z
=0
Z
=1
,D
=1
Z
=1
,D
=0
D
=0
Sample
size
N
60,696
30,131
30,565
20,147
9,984
40,549
Mean
of
observ
ed
time
¯ Y
i
14.051
14.1578
13.9457
14.905
12.6499
13.6266
Prop
ortion
of
ev
en
t
¯ R
i
0.0129
0.0123
0.0135
0.0123
0.0125
0.0133
Prop
ortion
of
screen
in
vited
¯ Z
i
0.4964
1
0
1
1
0.2462
Prop
ortion
of
screen
receiv
ed
¯ D
i
0.3319
0.6686
0
1
0
0
HIP
18
data
T
otal
Screen
in
vited
Con
trol
Screen
receiv
ed
Screen
refused
No
Screen
receiv
ed
Z
=1
Z
=0
Z
=1
,D
=1
Z
=1
,D
=0
D
=0
Sample
size
N
60,532
30,058
30,474
20,132
9,926
40,400
Mean
of
observ
ed
time
¯ Y
i
13.9756
14.0816
13.8709
14.8112
12.6018
13.5591
Prop
ortion
of
ev
en
t
¯ R
i
0.0126
0.0119
0.0134
0.0119
0.0119
0.0130
Prop
ortion
of
screen
in
vited
¯ Z
i
0.4966
1
0
1
1
0.2457
Prop
ortion
of
screen
receiv
ed
¯ D
i
0.3326
0.6698
0
1
0
0
HIP
7
data
T
otal
Screen
in
vited
Con
trol
Screen
receiv
ed
Screen
refused
No
Screen
receiv
ed
Z
=1
Z
=0
Z
=1
,D
=1
Z
=1
,D
=0
D
=0
Sample
size
N
60,532
30,058
30,474
20,132
9,926
4,0400
Mean
of
observ
ed
time
¯ Y
i
6.5503
6.5813
6.5197
6.7393
6.2607
6.4561
Prop
ortion
of
ev
en
t
¯ R
i
0.0030
0.0021
0.0038
0.0020
0.0024
0.0035
Prop
ortion
of
screen
in
vited
¯ Z
i
0.4966
1.0000
0
1.0000
1.0000
0.2457
Prop
ortion
of
screen
receiv
ed
¯ D
i
0.3326
0.6698
0
1
0
0
Z
i
=
1
:
in
vitation
for
screening;
D
i
=
1
:
receipt
of
screening
and
R
i
=
1
:
death
from
breast
cancer.
69

9.2
HIP data analysis: via the non-ignorable censoring
mechanism PPO-survival models
For the HIP trial, it is assumed that whether a participant died from breast cancer or not
would not be related to any other participant's treatment status. We can then suppose
Assumption 1 (given in Section 3.2), the SUTVA assumption is satisfied. Assumption
2 (random assignment) clearly holds as all participants in the HIP trial were randomly
assigned to the screening or the control group (Shapiro et al., 1988). Some participants
assigned to the screening group refused to take the screening examination (Shapiro
et al., 1988; Baker, 1998). This implies that these individuals did not follow their
allocated randomised assigned treatment (Z
i
). Assumption 3, namely that of exclusion
restriction, can then be assumed. This means that the assigned treatment (Z
i
) affects
the outcome of interest only through the actual received treatment (D
i
(Z
i
)). We also
suppose Assumption 4, that of monotonicity, holds in the HIP trial, as it is more
acceptable to suppose that no defiers exist than to have defiers in a RCT (Angrist
et al., 1996; Frangakis & Rubin, 1999; Baker, 1998; Yau & Little, 2001; Peng et al.,
2004; O'Malley & Normand, 2005; Mattei & Mealli, 2007). Finally, we suppose that the
latent ignorability assumption holds in the HIP trial, i.e. that the potential failure time
and censoring time are independent, conditional on the latent compliance type. Hence,
all assumptions given in Section 3.2 are assumed satisfied in the HIP trial. In addition,
for all individuals in this trial, there are only two possible types of compliance behavior,
namely compliers or never-takers. Note that no woman assigned to the control group
could receive the screening examination, which implies that no always-takers exist in
the HIP breast screening trial. In other words, the assumption of no access to treatment
for the control (Loeys & Goetghebeur, 2003; Frangakis & Rubin, 1999) holds in the HIP
breast screening trial. However, it is not a necessary assumption for our PPO-survival
models (derived in Section 4).
70

9.2.1
Model specification for the HIP trial
Since no always-takers exist in the HIP trial, the number of compliance types in the
HIP data has thus been reduced to two, and the number of subgroups of interest in the
HIP data are thus reduced to three (Subgroup j ; j = 1, 2, 3), specified as follows:
Subgroup1 : compliers (U
i
= c) from the screening group (Z
i
= 1);
Subgroup2 : compliers (U
i
= c) from the control group (Z
i
= 0);
Subgroup3 : never-takers (U
i
= n) from either the screening (Z
i
= 1) or the control
(Z
i
= 0) group.
Consequently, for the HIP breast cancer data, the model parameter vector
, given
in equation (5.7), is simplified as follows:
= (, ); where
= (
T,c1
,
T,c0
,
T,n
,
T
,
C,c1
,
C,c0
,
C,n
,
C
),
and
= (
c,00
).
(9.2)
Recall that only
need to be estimated using the EM algorithm, is the interest of
model parameter.
Note that the parameters corresponding to the always-takers subgroup in equation
(5.7) are thus excluded in equation (9.2), because no always-takers exist in the HIP trial.
This can be denoted by
a
= 0, which indicates that the ith individual in the HIP trial,
has 0 (zero) probability of being an always-taker. Therefore, the parameter
c,11
, defined
by the first equation in (5.11), is fixed as 1 (because
a
= 0), i.e.
c,11
1. So for the
HIP data,
c,11
is clearly not a parameter that needs to be estimated. Furthermore, the
parameter
c,00
, defined by the second equation in (5.11) is equivalent to
c
, (
c,00
c
)
because
c
+
n
1 in the case of the HIP study.
Hence, for the NIGN-WW model, which assumes that both failure and censoring
time follow a Weibull distribution, the interest model parameter vector,
W W
, includes
71

eight elements, i.e.:
W W
= (
T,c1
,
T,c0
,
T,n
,
T
,
C,c1
,
C,c0
,
C,n
,
C
)
T
,
(9.3)
where
c1
,
c0
and
n
represent the scale parameters of three Weibull distributions for
failure time (T
i
) associated with the relevant subgroups (Subgroup1, Subgroup2 and
Subgroup3), with
T
as the common shape parameter for these Weibull distributions;
and
C,c1
,
C,c0
and
C,n
represent the scale parameters of three Weibull distributions
for censored time (C
i
) associated with the same three relevant subgroups (Subgroup1,
Subgroup2 and Subgroup3) and with
c
as the common shape parameter for these
distributions.
When the log-normal is assumed to be the distribution for failure and censoring
time , the interest NIGN-LL model parameter vector, denoted by
LL
, is as follows:
LL
= (
T,c1
,
T,c0
,
T,n
,
T
,
C,c1
,
C,c0
,
C,n
,
C
),
(9.4)
where
c1
,
c0
and
n
denote the mean of the logarithm of failure time, corresponding to
the three extant subgroups (Subgroup1, Subgroup2 and Subgroup3);
T
is their common
variance (of the logarithm of failure time); and
C,c1
,
C,c0
,
C,n
and
C
represent the
parameters corresponding to the logarithm of censoring time.
The model parameters corresponding to the specific non-ignorable PPO-survival
models (for the HIP trial) are given in Table 8.
Table 8: Non-ignorable PPO-survival model parameters for the HIP trial.
Model name
model parameter vector
NIGN-WW
(
T,c1
,
T,c0
,
T,n
,
T
,
C,c1
,
C,c0
,
C,n
,
C
)
NIGN-LL
(
T,c1
,
T,c0
,
T,n
,
T
,
C,c1
,
C,c0
,
C,n
,
C
)
The estimated values of the interest model parameter vectors
, associated with the
two non-ignorable survival models, namely the NIGN-WW and NIGN-LL models, are
72

given in Appendix10. Specifically, Tables B-1 and B-2 (on page 118) give analogous
estimates for the HIP
18
data; Tables B-3 and B-4 (on page 119), for the HIP
7
data.
Details on the iteration process used in the EM algorithm (Dempster et al., 1977) for
these two specific non-ignorable PPO-survival models are detailed in Appendix
, where
Tables B-5 (on page 120) and B-6 (on page 121) give the iteration of convergence in the
EM algorithm for the HIP
18
dataset, and Tables B-7 (on page 122) and B-8 (on page
123), for the HIP
7
dataset.
9.2.2
Estimation of causal effects: CACE(t) and ACE(t)
Table 9: Estimated CACE(t) and associated standard errors per model.
Models
NIGN-WW model
NIGN-LL model
Data
HIP
18
Years
CACE (SE
CACE
)
CACE ( SE
CACE
)
1
0.19E-4 (0.10E-4)
0.15E-4 (0.07E-4)
5
3.56E-4 (1.86E-4)
5.22E-4 (2.19E-4)
10
12.6E-4 (6.57E-4)
17.9E-4 (7.17E-4)
15
26.2E-4 (13.7E-4)
32.1E-4 (13.2E-4)
Data
HIP
7
Years
CACE (SE
CACE
)
CACE (SE
CACE
)
1
0.66E-4 (0.24E-4)
0.62E-4 (0.25E-4)
5
14.8E-4 (3.83E-4)
15.7E-4 (4.02E-4)
For causal inference in time-to-event studies, we focus on estimating the causal
effects in terms of CACE(t) and ACE(t).
Tables 9 (on page 73) and 10 (on page 74) give estimated
CACE(t) and ACE(t) for the two non-ignorable PPO survival models (the NIGN-WW
and the NIGN-LL model). Their associated standard errors are also given in Tables 9
and 10, which are calculated by following the formulation given in equations (C-15) and
(C-24).
73
A

Table 10: Estimated ACE(t) and associated standard errors per model.
Models
N IGN
- W W model NIGN - LL model
Data
HIP
18
Years
ACE ( SE
ACE
)
ACE (SE
ACE
)
1
0.13E-4 (0.07E-4)
0.10E-4 (0.05E-4)
5
2.38E-4 (1.25E-4)
3.49E-4 (1.47E-4)
10
8.41E-4 (4.40E-4)
11.6E-4 (4.80E-4)
15
17.5E-4 (9.19E-4)
21.5E-4 (8.81E-4)
Data
HIP
7
Years
ACE (SE
ACE
)
ACE (SE
ACE
)
1
0.44E-4 (0.16E-4)
0.41E-4 (0.16E-4)
5
9.90E-4 (2.56E-4)
10.5E-4 (2.69E-4)
Since CACE(t) and ACE(t) both are the functions of the MLE and appealing to
large-sample normality of the MLE, the 95% lower and upper confidence limits (LCL
and UCL) of CACE(t) are calculated easily as follows:
LCL
cace
(t) = CACE(t)
- 1.96SE
cace
(t)
U CL
cace
(t) = CACE(t) + 1.96SE
cace
(t).
(9.5)
Similarly, the 95% lower and the upper confidence limits (LCL and UCL) of ACE(t)
are calculated via the following formulae:
LCL
ace
(t) = ACE(t)
- 1.96SE
ace
(t)
U CL
ace
(t) = ACE(t) + 1.96SE
ace
(t).
(9.6)
The associated Z values of CACE(t) and ACE(t) are also given for all analyses. The
Z value is calculated by using the following equations:
Z
- value
cace
(t) =
CACE(t)
- 0
SE
cace
(t)
(9.7)
74

and
Z
- value
ace
(t) =
ACE(t)
- 0
SE
ace
(t)
;
(9.8)
where 0 is the assumed under the null mean value of CACE(t) or ACE(t). The p-
value derives from 1
- (Z - value) (one tail), where () denotes the standard normal
cumulative probability function.
Further details regarding the estimated ACE(t) and CACE(t), obtained via our
non-ignorable PPO-survival models for the HIP
18
and HIP
7
data sets are given in
Appendix10, where Tables B-10, B-11 give the estimated ACE(t) for the HIP
18
dataset
via the NIGN-WW and NIGN-LL model, respectively, and Tables B-13 and B-14 give
the estimated CACE(t) for the HIP
18
data set via the same two models. Tables B-17,
B-18, B-19 and B-20 are analogous for the analysis of the HIP
7
data set.
Based on these estimated values of ACE(t) and CACE(t), the curves of the estimated
CACE(t) and ACE(t) against survival time (years survival post entry) are plotted in
the following figures per data subset analysed. For the the HIP
18
data, Figures 1 and
2 (on page 76) show the estimated CACE(t) curves obtained using the NIGN-WW and
NIGN-LL model, respectively, and their associated ACE(t) curves are given in Figures 3
and 4 (on page 77); the corresponding curves of estimated CACE(t) and ACE(t) (with
the same names) for the HIP
7
data are given in Figures 5 and 6 (on page 78) and
Figures 7 and 8 (on page 79).
75

2
4
6
8
10
12
14
16
18
-2
-1
0
1
2
3
4
5
6
7
x 10
-3
Survival time in years
CACE(t)
CACE
nign-ww
for HIP
18
LCL of CACE
nign-ww
for HIP
18
UCL of CACE
nign-ww
for HIP
18
Figure 1: Estimated CACE(t) curve using the NIGN-WW model with 95% con-
fidence interval: HIP
18
data. The
dash-dot line
and the
dotted line
indicate the
Lower Confidence Limit(LCL) and the Upper Confidence Limit (UCL).
2
4
6
8
10
12
14
16
18
-2
-1
0
1
2
3
4
5
6
7
x 10
-3
Survival time in years
CACE(t)
CACE
nign-ll
for HIP
18
LCL of CACE
nign-ll
for HIP
18
UCL of CACE
nign-ll
for HIP
18
Figure 2: Estimated CACE(t) curve using the NIGN-LL model with 95% confi-
dence interval: HIP
18
data. The
dash-dot line
and the
dotted line
indicate the
Lower Confidence Limit(LCL) and the Upper Confidence Limit (UCL).
76

2
4
6
8
10
12
14
16
18
-2
-1
0
1
2
3
4
5
6
7
x 10
-3
Survival time in years
ACE(t)
ACE
nign-ww
for HIP
18
LCL of ACE
nign-ww
for HIP
18
UCL of ACE
nign-ww
for HIP
18
Figure 3: Estimated ACE(t) curve using the NIGN-WW model with 95% confi-
dence interval: HIP
18
data. The
dash-dot line
and the
dotted line
indicate the
Lower Confidence Limit(LCL) and the Upper Confidence Limit (UCL).
2
4
6
8
10
12
14
16
18
-2
-1
0
1
2
3
4
5
6
7
x 10
-3
Survival time in years
ACE(t)
ACE
nign-ll
for HIP
18
LCL of ACE
nign-ll
for HIP
18
UCL of ACE
nign-ll
for HIP
18
Figure 4: Estimated ACE(t) curve using the NIGN-LL model with 95% confidence
interval: HIP
18
data. The
dash-dot line
and the
dotted line
indicate the Lower
Confidence Limit(LCL) and the Upper Confidence Limit (UCL).
77

1
2
3
4
5
6
7
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
x 10
-3
Survival time in years
CACE(t)
CACE
nign-ww
for HIP
7
LCL of CACE
nign-ww
for HIP
7
UCL of CACE
nign-ww
for HIP
7
Figure 5: Estimated CACE(t) curve with 95% confidence interval: HIP
7
data
using the NIGN-WW model. The
dash-dot line
and the
dotted line
indicate the
Lower Confidence Limit(LCL) and the Upper Confidence Limit (UCL).
1
2
3
4
5
6
7
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
x 10
-3
Survival time in years
CACE(t)
CACE
nign-ll
for HIP
7
LCL of CACE
nign-ll
for HIP
7
UCL of CACE
nign-ll
for HIP
7
Figure 6: Estimated CACE(t) curve with 95% confidence interval: HIP
7
data
using the NIGN-LL model. The
dash-dot line
and the
dotted line
indicate the
Lower Confidence Limit(LCL) and the Upper Confidence Limit (UCL).
78

1
2
3
4
5
6
7
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
x 10
-3
Survival time in years
ACE(t)
ACE
nign-ww
for HIP
7
LCL of ACE
nign-ww
for HIP
7
UCL of ACE
nign-ww
for HIP
7
Figure 7: Estimated ACE(t) curve using the NIGN-WW model with 95% confi-
dence interval: HIP
7
data. The
dash-dot line
and the
dotted line
indicate the
Lower Confidence Limit(LCL) and the Upper Confidence Limit (UCL).
1
2
3
4
5
6
7
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
x 10
-3
Survival time in years
ACE(t)
ACE
nign-ll
for HIP
7
LCL of ACE
nign-ll
for HIP
7
UCL of ACE
nign-ll
for HIP
7
Figure 8: Estimated ACE(t) curve using the NIGN-LL model with 95% confidence
interval: HIP
7
data. The
dash-dot line
and the
dotted line
indicate the Lower
Confidence Limit(LCL) and the Upper Confidence Limit (UCL).
79

9.2.3
Estimation of relevant survival probabilities
The key point distinguishing our non-ignorable PPO-survival models (the NIGN-WW
and NIGN-LL survival model) from conventional parametric survival models, is the fact
that an individual's compliance type is allowed for, as proposed in Chapter 5. Unlike
conventional parametric models, the NIGN-WW and the NIGN-LL models allow us to
estimate the survival probability for three afore-mentioned subgroups in the HIP trial,
namely Subgroup 1, Subgroup 2 and Subgroup 3 as defined in Section 9.2.1.
The estimated survival curves corresponding to these three subgroups (denoted by
S
c1
(t), S
c0
(t) and S
n
(t)) for the NIGN-WW and the NIGN-LL model application on the
HIP
18
and the HIP
7
subdata are given in Figure 9 (on page 82), where the top row
gives the estimated survival curves evaluated using the NIGN-LL model for the HIP
18
data (top left) and the HIP
7
data (top right). The figures located in the bottom
row of Figure 9 represent the corresponding estimated curves for the NIGN-LL model.
In addition, it is not difficult to obtain estimated survival probabilities from an ITT
analytic perspective via our PPO-survival model; Figure 10 (on page 83) shows the
relevant the ITT estimated survival curves. Equation (9.9) gives the formulations of
calculating the ITT survival probabilities S
0
(t) and S
1
(t).
S
0
(t) = (1
-
c
)S
n
(t) +
c
S
c0
(t);
S
1
(t) = (1
-
c
)S
n
(t) +
c
S
c1
(t).
(9.9)
All estimated values of the relevant survival probabilities, associated with Figure 9,
and Figure 10 are detailed in Tables B-21, B-22, B-25 and B-26 in Appendix10 for the
reader's perusal.
The survival curves plotted on the two graphs at the right-hand side of Figure 9 (on
page 82) correspond to the HIP
7
dataset via the NIGN-LL model (top graph) and the
NIGN-WW model (bottom graph), respectively. They show that Subgroup 1, which
consists of compliers, who are assigned to the screening (treated) group, has the highest
80

survival probability among the three subgroups; Subgroup 2 has the lowest survival
probability, in which individuals are all compliers and assigned to the control group;
while the estimated survival curve of Subgroup 3, which consists of all never-takers,
who are assigned to either the screening or the control groups, lies between the curves
of Subgroup 1 and Subgroup 2. These results can be interpreted thus: Subgroup 3
(never-takers) can be viewed as a healthy group, so it has a survival probability higher
than Subgroup 2. Note that it is reasonable to make this assumption as presumably
individuals in Subgroup 3 do not think a screening examination is necessary, for they
refuse to take the screening examination (when assigned to the screening group). The
estimated survival curves for the HIP
7
dataset also show that the NIGN-LL model and
the NIGN-WW model provide very simliar results.
Unlike the analysis of the HIP
7
dataset, for the HIP
18
dataset, the survival curves
estimated using the NIGN-LL model and the NIGN-WW model are slightly different.
The NIGN-WW model provides survival curves for Subgroups 2 and 3 that are closer
than does the NIGN-LL model. However, the estimated survival curve for Subgroup 3
still lies between these two survival curves for Subgroup 1 and 2 (the left-hand graph in
Figure 9). Therefore, the assumption mentioned above, that individuals in Subgroup 3
are all healthy -- or at least thought of themselves as healthy at that time -- is still valid.
Since this is a scenario of long term follow-up, individuals' health status, at enrolment,
may not affect their survival probability in later years. It is noteworthy that in the
analysis of the HIP
18
dataset, the survival curves of Subgroup 3 are closer to the survival
curves of Subgroup 2 than Subgroup 1. This also makes sense in that individuals, either
in Subgroup 2 or Subgroup 3, have not received a screening examination. It is thus not
surprising to see these two subgroups having similar estimated survival curves. Indeed,
the survival probability in later years is minimally related to the individual's health
status at enrolment.
It is important to note that an ordinary ITT analysis does not provide these nuances
nor this information ( Figure 9, on page 82 and Figure 10 on page 83).
81

0
2
4
6
8
10
12
14
16
18
0.98
0.982
0.984
0.986
0.988
0.99
0.992
0.994
0.996
0.998
1
Survival time in years
Survival probability
S
c1
(t) in NIGN-LL for HIP
18
S
c0
(t) in NIGN-LL for HIP
18
S
n
(t) in NIGN-LL for HIP
18
1
2
3
4
5
6
7
0.98
0.982
0.984
0.986
0.988
0.99
0.992
0.994
0.996
0.998
1
Survival time in years
Survival probability
S
c1
(t) in NIGN-LL for HIP
7
S
c0
(t) inNIGN-LL for HIP
7
S
n
(t) in NIGN-LL for HIP
7
0
2
4
6
8
10
12
14
16
18
0.98
0.982
0.984
0.986
0.988
0.99
0.992
0.994
0.996
0.998
1
Survival time in years
Survival probability
S
c1
(t) in NIGN-WW for HIP
18
S
c0
(t) inNIGN-WW for HIP
18
S
n
(t) in NIGN-WW for HIP
18
1
2
3
4
5
6
7
0.98
0.982
0.984
0.986
0.988
0.99
0.992
0.994
0.996
0.998
1
Survival time in years
Survival probabilty
S
c1
(t) in NIGN-WW for HIP
7
S
c0
(t) in NIGN-WW for HIP
7
S
n
(t) in NIGN-WW for HIP
7
Figure
9:
Estimated
surviv
al
curv
es
using
the
NIGN-LL
mo
del
(top
ro
w)
and
the
NIGN-WW
mo
del
(b
ottom
ro
w)
for
the
HIP
18
(left)
and
HIP
7
(righ
t)
datasets.
The
solid
line
corresp
onds
the
surviv
al
curv
e
for
Subgroup
1:
compliers
assigned
to
the
screening
group;
the
dash
line
corresp
onds
the
surviv
al
curv
e
for
Subgroup
2:
compliers
assigned
to
the
con
trol
group;
the
dash-dot
line
corresp
onds
the
surviv
al
curv
e
for
Subgroup
3:
nev
er-tak
ers
assigned
to
either
the
con
trol
or
the
screening
groups.
82

2
4
6
8
10
12
14
16
18
0.98
0.982
0.984
0.986
0.988
0.99
0.992
0.994
0.996
0.998
1
Survival time in years
Survival probability
S
1
(t) in nign-ll for HIP
18
S
0
(t) in nign-ll for HIP
18
1
2
3
4
5
6
7
0.98
0.982
0.984
0.986
0.988
0.99
0.992
0.994
0.996
0.998
1
Survival time in years
Survival probability
S
1
(t) in nign-ll for HIP
7
S
0
(t) in nign-ll for HIP
7
2
4
6
8
10
12
14
16
18
0.98
0.982
0.984
0.986
0.988
0.99
0.992
0.994
0.996
0.998
1
Survival time in years
Survival probability
S
1
(t) in nign-ww for HIP
18
S
0
(t) in nign-ww for HIP
18
1
2
3
4
5
6
7
0.98
0.982
0.984
0.986
0.988
0.99
0.992
0.994
0.996
0.998
1
Survival time in years
Survival probability
S
1
(t) in nign-ww for HIP
7
S
0
(t) in nign-ww for HIP
7
Figure
10:
The
ITT
estimated
surviv
al
curv
es
using
the
NIGN-LL
mo
del
(top
ro
w)
and
the
NIGN-WW
mo
del
(b
ottom
ro
w)
for
the
HIP
18
(left)
and
HIP
7
(righ
t)
datasets.
The
solid
line
corresp
onds
the
surviv
al
curv
e
for
the
group
assigned
for
screening
;
the
dash
line
corresp
onds
the
group
not
assigned
for
screening.
83

9.3
HIP data analysis: via the Frangakis-Rubin method
9.3.1
Frangakis-Rubin method
Let S
uz
(t) denote the survival function for the subgroup with U
i
= u and Z
i
= z, and
let S
z
(t) denote the survival function associated with two treatment levels, z = 1 for
the treatment and 0 for the control. According to equation (5.1) given in Frangakis
& Rubin (1999), the survival functions corresponding to the assigned treatment group
Z
i
= 1 (for i
{i : Z
i
= 1
}) and the control group Z
i
= 0 (for i
{i : Z
i
= 0
}) can be
re-written as follows:
S
0
(t) = (1
-
c
)S
n0
(t) +
c
S
c0
(t)
S
1
(t) = (1
-
c
)S
n1
(t) +
c
S
c1
(t);
(9.10)
where
c
is the probability of compliers in the population.
Under the assumption of exclusion restriction, we have S
n0
(t) = S
n1
(t). Using S
n
(t)
to replace S
n0
(t) and S
n1
(t) in equation (9.10), we then have a form of the Frangakis-
Rubin (F-R) model as follows:
S
0
(t) = (1
-
c
)S
n
(t) +
c
S
c0
(t);
S
1
(t) = (1
-
c
)S
n
(t) +
c
S
c1
(t).
(9.11)
Note that S
c1
(t) and S
n
(t) can be estimated with the Kalan-Meier estimator directly.
Because compliance type U
i
= u is unobservable, S
c0
(t) the survival function asso-
ciated with the subgroup with U
i
= c and Z
i
= 0, cannot be estimated directly with
the Kalan-Meier estimator through observation. Frangakis & Rubin (1999) provide a
formulation for evaluating this, which is given below:
^
S
c0
(t) = exp
-
t
0
dN
obs
0
(x)/N
0
- dN
n1
(x)/N
1
M
0
(x)/N
0
- M
n1
(x)/N
1
;
(9.12)
where N
1
and N
0
are the number of individuals randomly assigned to the treated and
the control group respectively, and N
obs
i
(x) and M
obs
i
(x) denote the stochastic processes
84

(for i = 1, . . . , n) and are defined as follows:
N
i
(x) := I(T
i
x, R
i
= 1);
M
i
(x) := I(T
i
x);
(9.13)
and
N
n1
(x) =
i
N
i
(x)I(U
i
= n)I(Z
i
= 1);
N
0
(x) =
i
N
i
(x)I(Z
i
= 0);
M
n1
(x) =
i
M
i
(x)I(U
i
= n)I(Z
i
= 1);
M
0
(x) =
i
M
i
(x)I(Z
i
= 0).
(9.14)
Given equations (9.13) and (9.14), we note that N
n1
(x) and M
n1
(x) indicate the
number of event occurrences or the number of survivors at or prior to the given time
t for individuals who are never-takers assigned to the treatment group; whereas N
0
(x)
and M
0
(x) denote the number of event occurrences or the number of survivors at the
given time t for individuals who are assigned to the control group.
The Frangakis-Rubin (F-R) method (Frangakis & Rubin, 1999) can then be re-
garded as a non-parametric form of the noncompliance and non-ignorable survival po-
tential outcomes model. This is true, because even though the F-R method does not, in
fact, conventionally evaluate CACE(t) and ACE(t), the survival function values, S
c1
(t),
S
c0
(t) and S
n
(t), which are required in the calculation of CACE(t) and ACE(t), can
all be obtained by using the F-R model via the Kaplan-Meier estimator and via equa-
tions (9.12) and (9.11). More specifically, S
c1
(t) and S
n
(t)), the survival probabilities
corresponding to Subgroup 1 and Subgroup 3, can be estimated directly via the Kaplan-
Meier estimator, and S
c0
(t), the survival probability corresponding to Subgroup 2, can
be calculated using equation (9.12) as was proposed by Frangakis & Rubin (1999). Then
S
1
(t) and S
0
(t), the survival probability corresponding to the assigned treatment Z
i
= 1
and the control Z
i
= 0 group, respectively, can be obtained directly via equation (9.11),
85

which was also proposed by Frangakis & Rubin (1999).
The two datasets from the HIP trial, (HIP
18
and HIP
7
), are analysed according to
the F-R method, using a programme written in MATLAB (Higham & Higham, 2000).
This code follows equations (9.12) and (9.14) as proposed by Frangakis & Rubin (1999).
9.3.2
Estimation of relevant survival probabilities
2
4
6
8
10
12
14
16
18
0.98
0.982
0.984
0.986
0.988
0.99
0.992
0.994
0.996
0.998
1
Survival time in years
Survival probability
S
c1
(t) in F-R for HIP
18
S
c0
(t) in F-R for HIP
18
S
n
(t) in F-R for HIP
18
1
2
3
4
5
6
7
0.98
0.982
0.984
0.986
0.988
0.99
0.992
0.994
0.996
0.998
1
Survival time in years
Survival probability
S
c1
(t) in F-R for HIP
7
S
c0
(t) in F-R for HIP
7
S
n
(t) in F-R for HIP
7
Figure 11: Estimated survival curves via the F-R for the HIP
18
(left) and for the
HIP
7
(right) data . The
solid line
corresponds to the survival curve for Subgroup
1: compliers assigned to the screening group; the
dash-dot line
corresponds to the
survival curve for Subgroup 2: compliers assigned to the control group; the
dotted
line
corresponds to the survival curve for the Subgroup 3: never-takers assigned
to either the control or the screening group.
Figure 11 shows the estimated survival curves obtained via the F-R method, cor-
responding to the three subgroups (S
c1
(t), S
c0
(t) and S
n
(t)) for the two HIP datasets,
namely the HIP
18
and HIP
7
subsets, where the left-hand graph corresponds to the
HIP
18
dataset, and the right-hand one to the HIP
7
dataset. Both of the graphs show
that Subgroup 1 has a higher survival probability than the other two subgroups. This
is similar to the results given in Section 9.2, which were obtained via the non-ignorable
PPO-survival model. However, we can also see that in Figure 11, the estimated survival
curves for Subgroup 3 are unlike those in Figure 9, especially for the HIP
18
dataset.
86

2
4
6
8
10
12
14
16
18
0.98
0.982
0.984
0.986
0.988
0.99
0.992
0.994
0.996
0.998
1
Survival time in years
Survival probability
S
1
(t) in F-R for HIP
18
S
0
(t) in F-R for HIP
18
1
2
3
4
5
6
7
0.98
0.982
0.984
0.986
0.988
0.99
0.992
0.994
0.996
0.998
1
Survival time in years
Survival probability
S
1
(t) in F-R for HIP
7
S
0
(t) in F-R for HIP
7
Figure 12: The ITT estimated survival curves via the F-R for the HIP
18
(left)
and for the HIP
7
(right) data . The
solid line
corresponds to the survival curve
for the assigned screening group; the
dash line
corresponds to the survival curve
for the assigned control group.
Figure 11 (left) (on page 86) shows that Subgroup 3 has a higher value than Subgroup
2 before 12 years, after which time (t = 12) Subgroup 3 has the lowest value among
the three subgroups. These results are comprehensible, because the F-R model is a
non-parametric based method, yet at the same time they are harder to interpret than
results obtained via the non-ignorable PPO-survival model (Section 9.2.1).
While Figure 12 gives the estimated survival curves obtained using the F-R method
in an ITT analysis, S
1
(t) and S
0
(t), correspond to the assigned treated group Z
i
= 1
(the screen invited group) and the control group Z
i
= 0, respectively. All estimated
relevant survival values, which are associated with Figures 11 and 12, are given in Tables
B-23, B-24, B-27 and B-28 in Appendix10.
9.3.3
Estimation of causal effects: CACE(t) and ACE(t)
The estimated values of CACE(t) (obtained by the F-R method) and the associated 95%
confidence intervals (obtained by the bootstrap method (Efron & Tibshirani, 1993)) are
given in Table 11 ( for the HIP
7
dataset) and Table 12 ( for the HIP
18
dataset) on page
87

89. Estimated ACE(t) values are given in Tables 13 and 14 (on page 90). Note that no
parameters are estimated using the F-R model, as this is a non-parametric model. The
formulae given at the end of Section 9.2.2, including equations (9.5), (9.7), (9.6) and
(9.8), are also used to obtain Tables 11, 12, 13, 14, which provide estimated CACE(t)
and ACE(t) values for the F-R method.
Related CACE(t) curves for the HIP
7
and the HIP
18
data, estimated by the F-R
method, are given in Figures 13 and 14 (on page 93).
In Tables 12 and 14, it is noteworthy that CACE(t) and ACE(t) for years 4 to 13
(inclusive), are significant in the F-R analyses of the HIP
18
. This result is similar to
that obtained for the HIP
7
data, where CACE(t) and ACE(t) for years 4 to 7 are highly
significant (Tables 11 and 13).
Note that the estimated CACE(t) at the first seven years for the HIP
7
and the
HIP
18
datasets (Table 12 and Table 11) are slight different. This is because of that
these two datasets are not same even though we only consider the first seven years: 1)
the censoring indicators for individuals whose observed time are seven years in these
two HIP datasets are different. 2) the datasets re-sampled for Bootstrap are high likely
not same. Recall that for the F-R method, the Bootstrap approach is employed for
obtaining the standard errors of the estimated CACE(t) and ACE(t).
88

Table 11: Estimated CACE(t) (95% CI) using the F-R method for the HIP
7
data.
Method:
F-R
Data:
HIP
7
Year
CACE
F R
SE
cace
LCL
cace
U CL
cace
Zvalue
p
- value
1
0.57E
-04 1.39E-04 -2.14E-04 3.29E-04 0.4127
0.3399
2
3.03E
-04 2.43E-04 -1.74E-04 7.80E-04 1.2464
0.1063
3
5.35E
-04 3.31E-04 -1.14E-04 11.8E-04 1.6159
0.0531
4
11.0E
-04 4.44E-04
2.32E
-04
19.7E
-04 2.4836
0.0065
5
18.1E
-04 5.85E-04
6.58E
-04
29.5E
-04 3.0846
0.0010
6
28.0E
-04 7.08E-04
14.1E
-04
41.9E
-04 3.9505
0.0000
7
28.0E
-04 7.08E-04
14.1E
-04
41.9E
-04 3.9505
0.0000
HIP
7
: HIP data with follow-up of 7 years
Table 12: Estimated CACE(t) (95% CI) using the F-R method for the HIP
18
data.
Method:
F-R
Data:
HIP
18
Year
CACE
F R
SE
cace
LCL
cace
U CL
cace
Zvalue
p
- value
1
0.57E
-04 1.39E-04 -2.14E-04 3.29E-04 0.4127
0.3399
2
3.03E
-04 2.43E-04 -1.74E-04 7.80E-04 1.2464
0.1063
3
5.35E
-04 3.31E-04 -1.14E-04 11.8E-04 1.6159
0.0531
4
11.0E
-04 4.44E-04
2.32E
-04
19.7E
-04 2.4836
0.0065
5
18.1E
-04 5.85E-04
6.58E
-04
29.5E
-04 3.0846
0.0010
6
27.7E
-04 7.13E-04
1.38E
-03
41.7E
-04 3.8930
0.0000
7
25.6E
-04 7.91E-04
1.01E
-03
41.1E
-04 3.2321
0.0006
8
21.1E
-04 8.68E-04
4.05E
-04
38.1E
-04 2.4272
0.0076
9
25.8E
-04 9.69E-04
6.80E
-04
44.8E
-04 2.6616
0.0039
10
21.7E
-04 10.5E-04
1.23E
-04
42.3E
-04 2.0772
0.0189
11
25.9E
-04 11.3E-04
3.68E
-04
48.0E
-04 2.2849
0.0112
12
22.2E
-04 12.2E-04 -1.64E-04 46.1E-04 1.8252
0.0340
13
22.1E
-04 13.0E-04 -3.26E-04 47.5E-04 1.7082
0.0438
14
15.6E
-04 13.8E-04 -11.3E-04 42.6E-04 1.1349
0.1282
15
16.5E
-04 15.1E-04 -13.0E-04 46.1E-04 1.0962
0.1365
16
15.8E
-04 16.3E-04 -16.1E-04 47.6E-04 0.9682
0.1665
17
17.7E
-04 17.1E-04 -15.8E-04 51.1E-04 1.0353
0.1502
18
31.8E
-04 18.2E-04 -3.96E-04 67.5E-04 1.7427
0.0407
HIP
18
: HIP data with follow-up 18 years
89

Table 13: Estimated ACE(t) (95% CI) using the F-R method for the HIP
7
data.
Method:
F-R
Data:
HIP
7
Year
ACE
F R
SE
ace
LCL
ace
U CL
ace
Zvalue
p
- value
1
0.38E
-04 0.93E-04 -1.44E-04 2.20E-04 0.4126
0.3399
2
2.03E
-04 1.63E-04 -1.16E-04 5.23E-04 1.2464
0.1063
3
3.58E
-04 2.22E-04 -0.76E-04 7.93E-04 1.6160
0.0530
4
7.38E
-04 2.97E-04
1.56E
-04
13.2E
-04 2.4843
0.0065
5
12.1E
-04 3.92E-04
4.41E
-04
19.8E
-04 3.0855
0.0010
6
18.7E
-04 4.74E-04
9.44E
-04
28.0E
-04 3.9507
0.0000
7
18.7E
-04 4.74E-04
9.44E
-04
28.0E
-04 3.9507
0.0000
HIP
7
: HIP data with follow-up of 7 years
Table 14: Estimated ACE(t)(95% CI) using the F-R method for the HIP
18
data.
Method:
F-R
Data:
HIP
18
Year
ACE
F R
SE
ace
LCL
ace
U CL
ace
Zvalue
p
- value
1
0.38E
-04 0.93E-04 -1.44E-04 2.20E-04 0.4126
0.3399
2
2.03E
-04 1.63E-04 -1.16E-04 5.23E-04 1.2464
0.1063
3
3.58E
-04 2.22E-04 -0.76E-04 7.93E-04 1.6160
0.0530
4
7.38E
-04 2.97E-04
1.56E
-04
1.32E
-03 2.4843
0.0065
5
12.1E
-04 3.92E-04
4.41E
-04
1.98E
-03 3.0855
0.0010
6
18.6E
-04 4.77E-04
9.23E
-04
2.79E
-03 3.8932
0.0000
7
17.1E
-04 5.30E-04
6.74E
-04
2.75E
-03 3.2317
0.0006
8
14.1E
-04 5.81E-04
2.71E
-04
2.55E
-03 2.4268
0.0076
9
17.3E
-04 6.49E-04
4.55E
-04
3.00E
-03 2.6614
0.0039
10
14.6E
-04 7.01E-04
0.82E
-04
2.83E
-03 2.0769
0.0189
11
17.3E
-04 7.58E-04
2.46E
-04
3.22E
-03 2.2844
0.0112
12
14.9E
-04 8.15E-04 -1.10E-04 3.08E-03 1.8249
0.0340
13
14.8E
-04 8.68E-04 -2.19E-04 3.18E-03 1.7081
0.0438
14
10.5E
-04 9.21E-04 -7.60E-04 2.85E-03 1.1350
0.1282
15
11.1E
-04 10.1E-04 -8.72E-04 3.09E-03 1.0963
0.1365
16
10.5E
-04 10.9E-04 -10.8E-04 3.19E-03 0.9683
0.1665
17
11.8E
-04 11.4E-04 -10.6E-04 3.42E-03 1.0354
0.1502
18
21.3E
-04 12.2E-04 -2.66E-04 4.52E-03 1.7426
0.0407
HIP
18
: HIP data with follow-up of 18 years
90

9.4
Model comparisons
In Sections 9.2
and 9.3, we analysed the HIP data using three models, namely the
NIGN-WW model, the IGN-W model and the F-R method. In this section, we focus
on comparing these three models using the results of analysis of both the HIP
18
and
HIP
7
datasets.
We first compare the non-ignorable PPO-survival model and the F-R method (Sec-
tion 9.4.1). Secondly, we compare the ignorable PPO-survival model and the non-
ignorable PPO-survival model (Section 9.4.2).
9.4.1
Non-ignorable PPO-survival model vs the F-R method
Since the non-ignorable PPO-survival model is developed under the same assumptions
as those for the F-R method, it is reasonable to expect that the results from these two
approaches would be comparable. However, it also would not be surprising to see some
differences between these results on the HIP data; as indeed, these two approaches derive
from different methodologies. Recall that the non-ignorable PPO-survival model is
derived from a parametric perspective through the maximum likelihood method, whilst
the F-R method is a nonparametric approach which uses the Kaplan-Meier estimator
(p.84, Klein (1997)).
The results from the NIGN-WW and the NIGN-LL model are comparable to those
obtained by the Frangakis-Rubin (F-R) method, as applied to the HIP
7
data (Figure
15, on page 94). This was also observed after an analogous analysis of the HIP
18
data
(Figure 17, on page 96).
Since the F-R method derives from a nonparametric perspective, it would not be
surprising to see, however, some fluctuation in the F-R specific plot of CACE(t) (shown
on the left hand side of Figure 17 for the HIP
18
dataset). Note, however, from the
left hand side plots (Figure 15), that little fluctuation is evident in the F-R specific
CACE(t) plot for the HIP
7
dataset. This is because indeed the period of follow-up is
shortened from 18 to 7 years.
91

In addition to the results mentioned above, we have shown that the non-ignorable
PPO-survival models provide more precise estimators than those obtained by the F-R
method (Figures 15, and 17), evident by the discrepancy in width of the associated
confidence intervals.
92

1
2
3
4
5
6
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
x 10
-3
Survival time in years
CACE(t)
CACE
FR
for HIP
7
LCL of CACE
FR
for HIP
7
UCL of CACE
FR
for HIP
7
Figure 13: Estimated CACE(t) using the F-R method (with 95% confidence in-
terval): HIP
7
data. The
dash-dot line
and the
dotted line
indicate the Lower
Confidence Limit(LCL) and the Upper Confidence Limit (UCL).
2
4
6
8
10
12
14
16
18
-2
-1
0
1
2
3
4
5
6
7
x 10
-3
Survival time in years
CACE(t)
CACE
FR
for HIP
18
LCL of CACE
FR
for HIP
18
UCL of CACE
FR
for HIP
18
Figure 14: Estimated CACE(t) using the F-R method (with 95% confidence in-
terval): HIP
18
data. The
dash-dot line
and the
dotted line
indicate the Lower
Confidence Limit (LCL) and the Upper Confidence Limit (UCL).
93

1
2
3
4
5
6
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
x 10
-3
Survival time in years
CACE(t)
CACE
FR
for HIP
7
LCL of CACE
FR
for HIP
7
UCL of CACE
FR
for HIP
7
1
2
3
4
5
6
7
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
x 10
-3
Survival time in years
CACE(t)
CACE
nign-ww
for HIP
7
LCL of CACE
nign-ww
for HIP
7
UCL of CACE
nign-ww
for HIP
7
1
2
3
4
5
6
7
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
x 10
-3
Survival time in years
CACE(t)
CACE
nign-ll
for HIP
7
LCL of CACE
nign-ll
for HIP
7
UCL of CACE
nign-ll
for HIP
7
Figure
15:
Estimated
CA
CE
(t
)
curv
e
with
95%
confidence
in
terv
al:
HIP
7
data
using
the
F-R
metho
d
(left),
the
NIGN-
WW
mo
del
(middle)
and
the
NIGN-LL
mo
del
(righ
t).
The
dash-dot
line
and
dotted
line
indicate
the
Lo
w
er
Confidence
Limit(LCL)
and
the
Upp
er
Confidence
Limit
(UCL).
94

1
2
3
4
5
6
7
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
x 10
-3
Survival time in years
ACE(t)
ACE
FR
for HIP
7
UCL of ACE
FR
for HIP
7
LCL of ACE
FR
for HIP
7
1
2
3
4
5
6
7
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
x 10
-3
Survival time in years
ACE(t)
ACE
nign-ww
for HIP
7
LCL of ACE
nign-ww
for HIP
7
UCL of ACE
nign-ww
for HIP
7
1
2
3
4
5
6
7
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
x 10
-3
Survival time in years
ACE(t)
ACE
nign-ll
for HIP
7
LCL of ACE
nign-ll
for HIP
7
UCL of ACE
nign-ll
for HIP
7
Figure
16:
Estimated
AC
E
(t
)
curv
e
with
95%
confidence
in
terv
al:
HIP
7
data
using
the
F-R
metho
d
(left),
the
NIGN-
WW
mo
del
(middle)
and
the
NIGN-LL
mo
del
(righ
t).
The
dash
line
and
dotted
line
indicate
the
Lo
w
er
Confidence
Limit(LCL)
and
the
Upp
er
Confidence
Limit
(UCL).
95

2
4
6
8
10
12
14
16
18
-2
-1
0
1
2
3
4
5
6
7
x 10
-3
Survival time in years
CACE(t)
CACE
FR
for HIP
18
LCL of CACE
FR
for HIP
18
UCL of CACE
FR
for HIP
18
2
4
6
8
10
12
14
16
18
-2
-1
0
1
2
3
4
5
6
7
x 10
-3
Survival time in years
CACE(t)
CACE
nign-ww
for HIP
18
LCL of CACE
nign-ww
for HIP
18
UCL of CACE
nign-ww
for HIP
18
2
4
6
8
10
12
14
16
18
-2
-1
0
1
2
3
4
5
6
7
x 10
-3
Survival time in years
CACE(t)
CACE
nign-ll
for HIP
18
LCL of CACE
nign-ll
for HIP
18
UCL of CACE
nign-ll
for HIP
18
Figure
17:
Estimated
CA
CE
(t
)
curv
e
with
95%
confidence
in
terv
al:
HIP
18
data
using
the
F-R
metho
d
(left),
the
NIGN-WW
mo
del
(middle)
and
the
NIGN-LL
mo
del
(righ
t).
The
dash-dot
line
and
dotted
line
indicate
the
Lo
w
er
Confidence
Limit(LCL)
and
the
Upp
er
Confidence
Limit
(UCL).
96

2
4
6
8
10
12
14
16
18
-2
-1
0
1
2
3
4
5
6
7
x 10
-3
Survival time in years
ACE(t)
ACE
FR
for HIP
18
UCL of ACE
FR
for HIP
18
LCL of ACE
FR
for HIP
18
2
4
6
8
10
12
14
16
18
-2
-1
0
1
2
3
4
5
6
7
x 10
-3
Survival time in years
ACE(t)
ACE
nign-ww
for HIP
18
LCL of ACE
nign-ww
for HIP
18
UCL of ACE
nign-ww
for HIP
18
2
4
6
8
10
12
14
16
18
-2
-1
0
1
2
3
4
5
6
7
x 10
-3
Survival time in years
ACE(t)
ACE
nign-ll
for HIP
18
LCL of ACE
nign-ll
for HIP
18
UCL of ACE
nign-ll
for HIP
18
Figure
18:
Estimated
AC
E
(t
)
curv
es
with
95%
confidence
in
terv
al:
HIP
18
data
using
the
F-R
metho
d
(left),
the
NIGN-
WW
mo
del
(middle)
and
the
NIGN-LL
mo
del
(righ
t).
The
dash
line
and
dotted
line
indicate
the
Lo
w
er
Confidence
Limit(LCL)
and
the
Upp
er
Confidence
Limit
(UCL).
97

9.4.2
The NIGN-WW model vs the IGN-W model
1
2
3
4
5
6
7
0
0.5
1
1.5
2
2.5
3
3.5
x 10
-3
Survival time in years
CACE(t)
CACE
nign-ll
for HIP
7
CACE
ign-w
for HIP
7
CACE
nign-ww
for HIP
7
Figure 19: Comparisons of estimated CACE(t) curves using three models for the
HIP
7
data. The
dash-dot line
shows the IGN-W model, the
dotted line
shows
the NIGN-LL model and the
solid line
shows the NIGN-WW mdoel.
Figure 19 gives three model-specific curves of CACE(t) for the HIP
7
data, including
the CACE(t) curve that estimated by the IGN-W model. It shows that the CACE(t)
curve corresponding to the IGN-W model is farther from the two curves associated with
the NIGN-WW model and the NIGN-LL model. This implies that the IGN-W model
may overestimate CACE(t) on the HIP
7
data. Hence, in the HIP
7
data, we need to
assume latent ignorability; in which case, HIP
7
should be regarded as the special case
of a non-ignorable censoring mechanism. Furthermore, Figure 19 shows that the two
specific models from the non-ignorable PPO-survival model provide very similar values
for CACE(t) at each given time (in years). This provides strong evidence that the non-
ignorable-survival model is a robust model, in that it is not sensitive to the assumed
distribution. Column 2 of Table 15 gives the estimated CACE
3
(for the HIP
7
data)
3
According the definition given by Imbens & Rubin (1997), in this case, CACE can be
expressed as CACE =
c1
-
c0
where
c1
and
c0
denote the mean values of observed failure
corresponding to the treated and the control groups among compliers (U
i
= c).
98

Table 15: Comparisons between CACE and CACE(t) in two models: HIP
7
Model
CACE
¯
T
CACE(1)
CACE(5)
AIC =
-2ln(L()) + 2m
NIGN-WW
52.64
0.66E-04
14.8E-04
2.0949E+05
NIGN-LL
1312.8
0.62E-04
15.7E-04
2.8393E+05
with respect to failure time T, which can be expressed as
CACE = E(T
i
(1)
|Z
i
= 1, U
i
= c)
- E(T
i
(0)
|Z
i
= 0, U
i
= c).
The values of CACE obtained by using the NIGN-WW model and the NIGN-LL model
were 52.64 vs 1312.8. They are clearly showing that the estimator CACE strongly re-
lies on the assumed distribution; while Column 3 and 4 of Table 15 give the estimated
CACE(t)
4
for t = 1 and t = 5 on the HIP
7
dataset via the two aforementioned mod-
els, namely 0.66E
- 045 vs 0.62E - 04 (for t = 1) and 14.8E - 04 vs 15.7E - 04 (for
t = 5). These results provide evidence that inferences for the specific complier average
causal effect CACE(t) (for time-to-event data), are not as sensitive to distributional as-
sumptions as inference for CACE. Indeed, by comparing their AIC (Akaike Information
criterion
5
) for the HIP
7
data, the Weibull distribution (AIC
nign
-ww
= 2.0948E + 5)
fits better than the log-normal (AIC
nign
-ll
= 2.8392E + 5) distribution for the non-
ignorable PPO-survival model. But both models give similar results (Figures 5 and 6
on page 78).
On the other hand, estimated survival values for the HIP
18
and HIP
7
data, using
the IGN-W model (given on the right-hand graphs in Figure 20 on page 101 and Figure
22 on page 103), also show that the IGN-W model does not provide as reasonable results
as the non-ignorable PPO-survival models, (the NIGN-WW and the NIGN-LL model).
According to the estimated survival values for the three subgroups using the IGN-W
4
By the new definition which is given in equation (3.7)
5
A measure of the goodness fit of an estimated statistical model, AIC =
-2ln(L()) + 2m
where m is the number of parameters and ln(L()) is the maximum nature logarithm likelihood
function for the estimated model. The one with lowest AIC is the best model.
99

model, Subgroup 3 is the highest among the three subgroups, which is not consistent
with the results provided via the F-R method and the non-ignorable PPO-survival
models. This is hard to interpret reasonably.
In fact, both the NIGN-WW and the NIGN-LL models perform much better than
the IGN-W model (Figure 22 on page 103) and Figure 20 on page 101).
Figures 20 (on page 101) and 22 (on page 103) provide strong evidence to support
our argument, that in the case of both the HIP
18
and HIP
7
datasets, the NIGN-WW
model provides more precise results than does the IGN-W model. With respect to
the F-R and the NIGN-WW models, Subgroup 1 (individuals who took the screening
examination) has a higher survival probability than subgroups where individuals did not
take the screening examination. This shows a clear and significant treatment benefit of
breast screening.
100

2
4
6
8
10
12
14
16
18
0.98
0.982
0.984
0.986
0.988
0.99
0.992
0.994
0.996
0.998
1
Survival time in years
Survival probability
S
c1
(t) in F-R for HIP
18
S
c0
(t) in F-R for HIP
18
S
n
(t) in F-R for HIP
18
0
2
4
6
8
10
12
14
16
18
0.98
0.982
0.984
0.986
0.988
0.99
0.992
0.994
0.996
0.998
1
Survival time in years
Survival probability
S
c1
(t) in NIGN-WW for HIP
18
S
c0
(t) inNIGN-WW for HIP
18
S
n
(t) in NIGN-WW for HIP
18
0
2
4
6
8
10
12
14
16
18
0.98
0.982
0.984
0.986
0.988
0.99
0.992
0.994
0.996
0.998
1
Survival time in years
Survival probability
S
c1
(t) in IGN-W for HIP
18
S
c0
(t) in IGN-W for HIP
18
S
n
(t) in IGN-W for HIP
18
Figure
20:
Estimated
surviv
al
curv
es
using
the
F-R
(left),
the
NIGN-WW
(middle)
and
the
IGN-W
(righ
t)
mo
dels
for
the
HIP
18
data.
The
solid
line
corresp
onds
to
the
surviv
al
curv
e
for
Subgroup
1:
compliers
assigned
to
the
screening
group;
the
dash
line
corresp
onds
to
the
surviv
al
curv
e
for
Subgroup
2:
compliers
assigned
to
the
con
trol
group;
the
dash-dot
line
corresp
onds
to
the
surviv
al
curv
e
for
the
Subgroup
3:
nev
er-tak
ers
assigned
to
either
the
con
trol
or
the
screening
group.
101

2
4
6
8
10
12
14
16
18
0.98
0.982
0.984
0.986
0.988
0.99
0.992
0.994
0.996
0.998
1
Survival time in years
Survival probability
S
1
(t) in F-R for HIP
18
S
0
(t) in F-R for HIP
18
2
4
6
8
10
12
14
16
18
0.98
0.982
0.984
0.986
0.988
0.99
0.992
0.994
0.996
0.998
1
Survival time in years
Survival probability
S
1
(t) in nign-ww for HIP
18
S
0
(t) in nign-ww for HIP
18
2
4
6
8
10
12
14
16
18
0.98
0.982
0.984
0.986
0.988
0.99
0.992
0.994
0.996
0.998
1
Survival time in years
Survival probability
S
1
(t) in ign-w for HIP
18
S
0
(t) in ign-w for HIP
18
Figure
21:
ITT
estimated
surviv
al
curv
es
using
the
F-R
(left),
the
NIGN-WW
(middle)
and
the
IGN-W
(righ
t)
mo
dels
for
the
HIP
18
data.
The
solid
line
corresp
onds
to
the
group
assigned
for
screening;
the
dash
line
corresp
onds
to
the
group
not
assigned
for
screening.
102

1
2
3
4
5
6
7
0.995
0.9955
0.996
0.9965
0.997
0.9975
0.998
0.9985
0.999
0.9995
1
Survival time in years
Survival probability
S
c1
(t) in F-R for HIP
7
S
c0
(t) in F-R for HIP
7
S
n
(t) in F-R for HIP
7
1
2
3
4
5
6
7
0.995
0.9955
0.996
0.9965
0.997
0.9975
0.998
0.9985
0.999
0.9995
1
Survival time in years
Survival probabilty
S
c1
(t) in NIGN-WW for HIP
7
S
c0
(t) in NIGN-WW for HIP
7
S
n
(t) in NIGN-WW for HIP
7
1
2
3
4
5
6
7
0.995
0.9955
0.996
0.9965
0.997
0.9975
0.998
0.9985
0.999
0.9995
1
Survival time in years
Survival probability
S
c1
(t) in IGN-W for HIP
7
S
c0
(t) in IGN-W for HIP
7
S
n
(t) in IGN-W for HIP
7
Figure
22:
Estimated
surviv
al
curv
es
using
the
F-R
(left),
the
NIGN-WW
(middle)
and
the
IGN-W
(righ
t)
mo
dels
for
the
HIP
7
data.
The
solid
line
corresp
onds
to
the
surviv
al
curv
e
for
Subgroup
1:
compliers
assigned
to
the
screening
group;
the
dash-dot
line
corresp
onds
to
the
surviv
al
curv
e
for
Subgroup
2:
compliers
assigned
to
the
con
trol
group;
the
dotted
line
corresp
onds
to
the
surviv
al
curv
e
for
Subgroup
3:
nev
er-tak
ers
assigned
to
either
the
con
trol
or
the
screening
group.
103

1
2
3
4
5
6
7
0.995
0.9955
0.996
0.9965
0.997
0.9975
0.998
0.9985
0.999
0.9995
1
Survival time in years
Survival probability
S
1
(t) in F-R for HIP
7
S
0
(t) in F-R for HIP
7
1
2
3
4
5
6
7
0.995
0.9955
0.996
0.9965
0.997
0.9975
0.998
0.9985
0.999
0.9995
1
Survival time in years
Survival probability
S
1
(t) in nign-ww for HIP
7
S
0
(t) in nign-ww for HIP
7
1
2
3
4
5
6
7
0.995
0.9955
0.996
0.9965
0.997
0.9975
0.998
0.9985
0.999
0.9995
1
Survival time in years
Survival probability
S
1
(t) in ign-w for HIP
7
S
0
(t) in ign-w for HIP
7
Figure
23:
ITT
estimated
surviv
al
curv
es
using
the
F-R
(left),
the
NIGN-WW
(middle)
and
the
IGN-W
(righ
t)
mo
dels
for
the
HIP
7
data.
The
solid
line
corresp
onds
to
the
group
assigned
for
screening;
the
dash
line
corresp
onds
to
the
group
not
assigned
for
screening.
104

9.5
Results of the analysis of the HIP data via the non-
ignorable PPO-survival model
Table 9 (on page 73) shows the estimated complier average causal effect CACE(t) and
Table 10 (on page 74) shows the estimated average causal effect ACE(t) obtained via
the NIGN-WW and the NIGN-LL models. (Further details are given in Tables B-19,
B-20, B-13 and B-14 in Appendix 10). From these tables, we can see that CACE(t) and
ACE(t) are completely positive at each given observed year. This implies that a woman
undertaking screening would have a higher probability of surviving breast cancer than
she would without having been screened.
9.5.1
Estimated CACE(t) for the HIP
7
data
More specifically, from the NIGN-WW model on the HIP
7
data, CACE(5) = 14.8E
-04
and SE
cace(5)
= 3.83E
- 04 (Table 9 on page 73 or Table B-19 on page 132). This indi-
cates that after five years, a woman has about 0.148% more probability (with 0.0383%
standard error) of survival than otherwise. This can also be interpreted as follows: out
of 10,000, women about 14.8 (
±3.83) women would be saved from breast cancer-related
death five years after screening.
The result from the NIGN-LL model is similar, in that CACE(5) = 15.7E
- 04
(SE
cace(5)
= 4.02E
- 04) (Table 9 on page 73 or Table B-20 on page 132). This is
equivalent to about 15.7 (
±4.02) women in 10,000 being saved from breast cancer-
related death five years after screening.
9.5.2
Estimated CACE(t) for the HIP
18
data
For the HIP
18
data, the results show that in 10,000 women, about 3.56 (
±1.86) women
would be saved from breast cancer-related death five years after screening, as the com-
plier average causal effect (CACE(t)) with its associated standard error estimated (via
the NIGN-WW model) is CACE(5) = 3.56E
- 04 (SE
cace(5)
= 1.86E
- 04) (Table 9
105

on page 73 or Table B-13 on page 128). On the other hand, the CACE(t) obtained
via the NIGN-LL model is CACE(5) = 5.22E
- 04 (SE
cace(5)
= 2.19E
- 04) (Table 9
on page 73 or Table B-14 on page 129), which is equivalent to stating that about 5.22
(
±2.19) women in 10,000 are being saved from breast cancer-related death five years
after screening.
Even though the estimated values of CACE(5) from these two specific non-ignorable
PPO-survival models are different, they all tend to support the conclusion that women
(over 40 years), who are screened would have a significantly reduced risk of breast
cancer-related death.
By comparing the estimated complier average causal effects at the first year (CACE(1))
in the HIP trial, it is easy to see that CACE(1) < CACE(5) (Table 9 on page 73) for
the two specific non-ignorable PPO-survival models. More details are given in Tables
B-19 , B-20, B-13 and B-14 in Appendix 10. These results indicate that out of 10,000
women, the number of women saved from breast cancer related-death at the first year
is significantly less than at the fifth year after screening. For the HIP
7
dataset, the
number of women saved from breast cancer related-death at the first year is 0.66(
±0.24)
vs 14.8(
±3.83) the number of women being saved at the fifth year (using the NIGN-WW
model); with the NIGN-LL model, it is 0.62(
±0.25) vs 15.7(±4.02).
This provides strong evidence to support, what is nowadays, accepted as com-
mon sense: the earlier breast cancer is detected, (and the earlier it is treated), the
greater a woman's chance of survival. This conclusion is consistent with that reported
by Shapiro et al. (1988). Mammography screening is now commonplace in the US
(www.radiologyinfo.org), Australia (www.health.gov.au) and New Zealand (www.nzbcf.org.nz).
However, the results for the HIP
18
and HIP
7
datasets (Table 9 on page 73) obtained
using the NIGN-WW and the NIGN-LL models are different. One needs to make a
decision as to which model is closer to the truth. As we stated in section 9.2.3, in a
scenario of long-term follow-up, individuals's health status at beginning may not affect
their survival probability in later years. Especially, Figure 11 (on page 86) shows that
106

around 12 years it appears a crossing of the survival curves corresponding to Subgroup
2 S
c0
(t) and Subgroup 3 S
n
(t). This implies that for the HIP trial, the non-ignorable
PPO-Survival models, (e.g. the NIGN-WW and the NIGN-LL models) would perform
better in analysing a short term follow-up data (the HIP
7
) than a long-term follow-up
data (the HIP
18
).
Furthermore, by comparison with the F-R method, which gives CACE(5) = 18.1E
-
04 (SE
cace(5)
= 5.85E
- 04) for both the HIP
18
and HIP
7
datasets (Table 9 on page
73), we believe that the non-ignorable PPO-survival model is optimal for the HIP
7
data than the HIP
18
data, because the results of the HIP
7
data obtained by the non-
ignorable PPO-survival model, CACE(5) = 14.8E
- 04 (SE
cace(5)
= 3.83E
- 04) (in
the NIGN-WW model) and CACE(5) = 15.7E
- 04 (SE
cace(5)
= 4.02E
- 04) (in the
NIGN-LL model), are much more closer to those obtained via the F-R method than the
results of the HIP
18
data, which are CACE(5) = 3.56E
- 04 (SE
cace(5)
= 1.86E
- 04)
(in the NIGN-WW model) and CACE(5) = 5.22E
- 04 (SE
cace(5)
= 2.19E
- 04) (in
the NIGN-LL model).
Theoretically, the non-ignorable PPO-survival model and the F-R method are com-
parable, as they are both derived under the same assumptions. The difference between
them is only that the former is a parametric survival model, while the latter is a non-
parametric survival model. Table 16 (on page 108) shows the results of the HIP
7
data from the non-ignorable and F-R models. Clearly, the non-ignorable PPO-survival
models provide significantly smaller standard errors for CACE(t) compared to the F-R
method.
10
Conclusions and Discussion
This paper presents a likelihood-based approach to the problem of inference for causal
effects in randomized time-to-event studies with non-compliance and possibly non-
ignorable censoring. Our approach can be viewed as a parametric, likelihood based
107

Table 16: Results of the analysis of the HIP
7
data via the non-ignorable PPO sur-
vival model. Numbers indicate the number of increased survivals due to screening
per 10,000 (on HIP
7
data). The associated standard error is given in brackets.
Survival year
NIGN-WW model
NIGN-LL model
F-R method
1
0.66(
±0.24)
0.62(
±0.24)
0.57 (
±1.39)
5
14.8(
±3.83)
15.7(
±4.02)
18.1 (
±5.85)
alternative to Frangakis & Rubin (1999)'s non-parametric method of moments estima-
tor (section 5) and extends the likelihood methods of O'Malley & Normand (2005) to
the case of time to event studies, and also given a new definition of ACE and CACE
for time-to-event data, which is the extension of Peng et al. (2004), the latter is based
on general location models.
Likelihood theory guarantees consistency of point estimates, as is also the case for
Frangakis & Rubin (1999) moment estimates. In the context of normally distributed
outcomes, the simulation study of O'Malley & Normand (2005) suggests practical, finite
sample, advantages for likelihood based methods in terms of efficiency, mean-squared
error and confidence interval coverage, compared to moment estimation. In O'Malley
& Normand (2005)'s study these advantages persisted even when the distributional
assumptions of the likelihood construction were incorrect.
The fundamental idea underpinning this paper and previous work on causal infer-
ence under non-compliance and subsequent missing outcomes (Frangakis & Rubin, 1999;
O'Malley & Normand, 2005) is that adjustment for bias related to post-randomization
self-selection of treatment, can be achieved by the introduction of a compliance type
covariate which characterizes treatment received under all possible treatment assign-
ments. Censoring is assumed to be ignorable conditional on compliance type (latent
ignorability), but because this covariate is only partially observed the assumed censor-
ing mechanism is, in general, not ignorable given the observable data. This leads to a
more complex likelihood construction than in standard time to event analyzes because
108

the model for the censoring time does not drop out of the likelihood. As a consequence,
the censoring distribution needs to be explicitly modeled. While the assumption of
latent ignorability cannot be verified from observed data we agree with Frangakis &
Rubin (1999) that because compliance type may be associated with loss to follow-up
ignorability of the censoring mechanism conditional on compliance type will generally
be a more plausible assumption than unconditional ignorability. We note, however, that
in some randomized time to event studies with non-compliance, it will be reasonable to
ignore the censoring mechanism. For example in some studies, particularly those with
short follow-up time it is possible that censoring is caused solely by the conclusion of
the study. In which case the censoring time is effectively a known constant for each in-
dividual (end of study date minus entry date) and therefore does not enter the analysis
Cox & Oakes (1984); Klein (1997).
Our approach permits both estimation of the causal effect of treatment received and
the causal effect of treatment assignment. The former quantifies treatment efficacy in the
ideal case where all individuals assigned treatment actually comply with the treatment
programme. In the case of a drug trial this provides information on the biological effect
of the drug by contrasting outcomes for a group of compliers given the new treatment
with outcomes for that same group had they been give an standard treatment. On the
other hand the causal effect of assigned treatment quantifies the overall population effect
of introducing a new treatment regime or public health initiative, such as a screening
programme. The overall effectiveness of a such a programme will reflect both the efficacy
of the procedure and the proportion of the target population who take up the treatment
and these two concepts are combined in the paper. This is reflected in the definition
of the average causal effect of assigned treatment given in (3.8) under the compound
exclusion and monotonicity assumptions Frangakis & Rubin (1999) show that when
censoring time is associated with compliance type the usual intention to treat estimators,
based on comparison of Kaplan-Meier survival curves for the randomized and non-
randomized groups is inconsistent for the causal effect of assigned treatment. In contrast
109

their method of moments estimator and our likelihood based method provide consistent
estimates of this effect.
Although the issue of non-compliance with assigned treatment shifts the analysis
of randomized studies towards the analysis of observational studies, the randomization
of assigned treatment gives randomized studies a distinct advantage over observational
studies because it ensures that the distribution of compliance type can be assumed to
be the same in each treatment arm Together with plausible assumptions such as the ex-
clusion assumption this ensures that the observed study data are informative about the
distribution of compliance type The observational study counterpart to non-compliance
in randomized studies is the problem of confounding by unobserved covariates and be-
cause the observed study data are uninformative about unobserved covariates, causal
inference in the observational study setting can be highly sensitive to prior assump-
tions about the adequacy of control for confounding variables Greenland et al. (1999);
Greenland (2004).
Because the likelihood approach requires explicit modelling of both censoring time
and survival time distributions, one potential drawback to this approach is the potential
sensitivity of causal inferences to distributional assumptions.
In a specific application sensitivity can be explored by fitting the model under dif-
ferent distributional assumptions and the usual sorts of likelihood based model selection
criteria can be applied (AIC, BIC). The impact of distributional assumptions could be
lessened by developing a semi-parametric model in which both the survival time and
censoring time are modelled by Cox type proportional hazards models. As in this paper,
likelihood maximization for such a semi-parametric approach would be facilitated by an
E-M algorithm which exploits the conditional independence of the survival and censoring
time given the compliance type variable. Alternatively, moving to a Bayesian perspec-
tive when the likelihood construction in Section 4,is combined with a prior distribution
over the model parameters, posterior computation via a Gibbs sampler, alternating be-
tween imputation of unobserved compliance types and sampling of parameters from the
110

conditional posterior distribution, given the compliance states, is suggested. Cox type
models can be fitted within a Bayesian framework Gong et al. (2005) and therefore both
fully parametric and semi-parametric Bayesian modelling should prove feasible using the
general likelihood construction of section 4.
ACKNOWLEDGEMENT
The first author would like to thank the following organizations and persons: the
New Zealand Institute of Mathematics and its Application (NZIMA) for providing
her three year Ph.D. scholarship to do this research. the New Zealand Statistics
Association (NZSA) and the Department of Mathematics and Statistics for also
sponsoring her to attend relevant conferences.
Dr. Stuart G. Baker and his
assistant Diane Erwin, National Cancer Institute of USA, for kindly organising
the HIP data for this research.
111

References
Angrist, J. D., Imbens, G. W., & Rubin, D. B. (1996). Identification of causal
effects using instrumental variables. Journal of the American Statistical Asso-
ciation, 91(434), 444­455.
Baker, J. G., Rounds, J. B., & Zevon, M. A. (2000). A comparison of graded
response and rasch partial credit models with subjective well-being. Journal of
Educational and Behavioral Statistics, 25(3), 253­270.
Baker, S. G. (1992). A simple method for computing the observed information
matrix when using the EM algorithm with categorical data. The American
Statistical Association, 1(1), 63­67.
Baker, S. G. (1994). Gender, ethnicity, and homelessness - accounting for demo-
graphic diversity on the streets. American Behavioral Scientist, 37(4), 476­504.
Baker, S. G. (1997). All-or-none compliance. In S. Kotz, C. Read, & D. Banks
(Eds.), The Encyclopedia of Statistical Sciences, volume 1 (pp. 134­138). New
York: Wiley.
Baker, S. G. (1998).
Analysis of survival data from a randomised trial with
all-or-none compliance: Estimating the cost-effectiveness of a cancer screening
program. Journal of the American Statistical Association, 93(443), 929­934.
Baker, S. G. & Kramer, B. S. (2005). Simple maximum likelihood estimates of
efficacy in randomised trials and before-and-after studies, with implications for
meta-analysis. Statistical Methods in Medical Research, 14, 349­367.
Barnard, J., Frangakis, C. E., Hill, J. L., & Rubin, D. B. (2003). Principal
stratification approach to broken randomised experiments: A case study of
112

School Choice vouchers in New York City. Journal of the American Statistical
Association, 98(462), 299­311.
Casella, G. & Berger, R. L. (2002). Statistical inference. Australia ; Pacific Grove,
CA: Thomson Learning, 2nd edition.
Cox, D. & Oakes, D. (1984). Analysis of Survival Data. Monographs on Statistics
and Applied Probability. London; New York: Chapman and Hall.
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from
incomplete data via EM algorithm. Journal of the Royal Statistical Society
Series B-Methodological, 39(1), 1­38.
Efron, B. & Tibshirani, R. (1993). An Introduction to the Bootstrap. New York:
Chapman& Hall.
Fisher, L. D. (1999). Advances in clinical trials in the twentieth century. Annu.
Rev. Public Health, 20, 109­124.
Fleming, T. & Lin, D. (2000). Surivival analysis in clinical trials: Past develop-
ments and future directions. Biometrics, 56, 971­983.
Frangakis, C. E. & Rubin, D. B. (1999). Addressing complications of intention-to-
treat analysis in the combined presence of all-or-none treatment-noncompliance
and subsequent missing outcomes. Biometrika, 86(2), 365­379.
Frangakis, C. E. & Rubin, D. B. (2002). Principal stratification in causal inference.
Biometrics, 58(1), 21­29.
Gong, Z., Hudson, I., & Graham, P. (2005). A Bayesian Causal Cox Model (CCM)
for all-or-nothing compliance. In A. Francis, K. Matawie, A. Oshlack, & G.
113

Smyth (Eds.), 20th International Workshop in Statistical Modelling, Sydney,
Australia (pp. 227­230).
Graham, P. (2000). Bayesian inference for a generalized population attributable
fraction: the impact of early vitamin A levels on chronic lung disease in very
low birth weight infants. Statistics in Medicine, 19, 937­956.
Graham, P. (2001). Bayesian Causal Inference for Epidemiological and Clinical
Studies. Phd thesis, Otago, NZ.
Greenland, S. (2004). An overview of methods for causal inference from observa-
tional studies. In A. Gelman & X. L. Meng (Eds.), Applied Bayesian Modeling
and Causal Inference from Incomplete-Data Perspectives. John Wiley and Sons.
Greenland, S., Robins, J. M., & Pearl, J. (1999). Confounding and collapsibility
in causal inference. Statistical Science, 14(1), 29­46.
Herring, A. H., I. J. G. & Lipsitz, S. (2004). Non-ignorable missing covariate data
in survival analysis: A case-study of an international breast cancer study group
trial. Application of Statistics, 53(Part 2), 293­310.
Higham, D. & Higham, N. (2000). MATLAB Guide. SIAM.
Hirano, K. & Imbens, G. W. (2000). Assessing the effect of an influenza vaccine
in an encouragement design. Biometrics, 1, 69­88.
Holland, P. (1986). Statistics and causal inference. Journal of the American
Statistical Association, 81(396), 945­970.
Ibrahim, J. G., Chen, M. H., & Sinha, D. (2001). Bayesian Survival Analysis.
New York: Springer-Verlag.
114

Imbens, G. & Rubin, D. (1997). Bayesian inference for causal effects in randomised
experiments with noncompliance. The Annals of Statistics, 25(1), 305­327.
Jo, B. (2002). Statistical power in randomised intervention studies with noncom-
pliance. Psychological Methods, 7(2), 178­193.
Kalbfleisch, J. D. & Prentice, R. L. (2002). The Statistical Analysis of Failure
Time Data. New York: Wiley, 2nd edition.
Klein, J. P. (1997). Survival Analysis. Statistics for Biology and Health. New
York: Springer-Verlag, 1st edition.
Lagarias, J. G., Reeds, J. A., Wright, M. H., & Wright, P. E. (1998). Convergence
properties of the Nelder-Mead simplex method in low dimensions. Society for
Industrial and Applied Mathematics, 9(1), 112­147.
Laird, N. M. & Louis, T. A. (1982). Approximate posterior distributions for
incomplete data. Journal of Royal Statistical Society , Series B, 44(2), 190­
200.
Levy, D. E., O'Malley, A. J., & Normand, S. T. (2004). Covariate adjustment in
clinical trials with non-ignorable missing data and non-compliance. Statistics
in Medicine, 23, 2319­2339.
Loeys, T. & Goetghebeur, E. (2003). A causal proportional hazards estimator
for the effect of treatment actually received in a randomised trial with all-or-
nothing compliance. Biometrics, 59(1), 100­105.
Louis, T. A. (1982). Finding the observed information matrix when using the EM
algorithm. Journal of the Royal Statistical Society, Series B., 44(2), 226­233.
115

Mattei, A. & Mealli, F. (2007). Application of principal stratification approach
to the Faenza randomised experiment on breast self-examination. Biometrics,
63(June), 437­446.
McLachlan, G. J. & Krishnan, T. (1997). The EM Algorithm and Extensions.
New York: Wiley.
Mealli, F., Imbens, G. W., Ferro, S., & Biggeri, A. (2004). Analyzing a randomised
trial on breast self-examination with noncompliance and missing outcomes. Bio-
statistics, 5(2), 207­222.
Meng, X.-L. & Rubin, D. B. (1991). Using EM to obtain asymptotic variance-
covariance matrices: the SEM algorithm. Journal of the American Statistical
Association, 86(December), 899­909.
Meng, X.-L. & Rubin, D. B. (1993). Maximum likelihood estimation via the ECM
algorithm: a general framework. Biometrika, 80(2), 267­278.
O'Malley, A. J. & Normand, S. T. (2005). Likelihood methods for treatment
noncompliance and subsequent nonresponse in randomised trials. Biometrics,
61(June), 325­334.
Pearl, J. (2001). Causal inference in health sciences: a conceptual introduction.
Health Services and Outcomes Methodology, 2, 189­220.
Peng, Y., Little, R. J. A., & Raghunathan, T. E. (2004). An extended general
location model for causal inferences from data subject to noncompliance and
missing values. Biometrics, 60(September), 598­670.
Robins, J. M. & Tsiatis, A. A. (1991). Correcting for noncompliance in randomised
trials using rank preserving structural failure time models. Communications in
Statistics-Theory and Methods, 20(8), 2609­2631.
116

Rubin, D. (1978). Bayesian inference for causal effects: the role of randomisation.
The Annals of Statistics, 6(1), 34­58.
Rubin, D. (1986). Which ifs have causal answers?
Journal of the American
Statistical Association, 81, 961­962.
Rubin, D. (2005). Causal inference using potential outcomes: Design, modeling,
decisions. Journal of the American Statistical Association, 100(469), 322­331.
Shapiro, S. (1977). Evidence on screening for breast cancer from a randomized
trial. Cancer, 39(June Supplement), 2772­2782.
Shapiro, S., Venet, W., Strax, P., & Venet, L. (1988). Periodic Screening for Breast
Cancer: The Health Insurance Plan Project and Its Sequelae, 1963-1986. The
Johns Hopkins Series in Contemporary Medicine and Public Health. Baltimore:
The Johns Hopkins University Press.
Shapiro, S., Venet, W., Strax, P., Venet, L., & Roeser, R. (1982). Ten to fourteen
year effect of screening on breast cancer mortality. Journal of the National
Cancer Institute, 69(No.2), 349­355.
Tanner, M. A. (1993). Tools for statistical inference : methods for the exploration
of posterior distributions and likelihood functions. Springer series in statistics.
New York: Springer-Verlag, 2nd edition.
White, I. (2005). Uses and limitations of randomisation-based efficacy estimators.
Statistical Methods in Medical Research, 14, 327­347.
Yau, L. H. Y. & Little, R. J. (2001). Inference for the complier-average causal
effect from longitudinal data subject to noncompliance and missing data, with
application to a job training assessment for the unemployed. Journal of the
American Statistical Association, 96(456), 1232­1244.
117

Appendix B
Tables
Table B-1: Estimated model parameter ^
in the NIGN-WW model for the HIP
18
data.
Method:
NIGN-WW
Data:
HIP
18
^
Estimates
SE
LCL
UCL
T,c1
0.0057
0.0005
0.0047
0.0067
T,c0
0.0064
0.0006
0.0052
0.0076
T,n
0.0064
0.0006
0.0053
0.0075
T
1.8254
0.0613
1.7053
1.9456
C,c1
0.0622
0.0001
0.0620
0.0625
C,c0
0.0639
0.0002
0.0636
0.0642
C,n
0.0663
0.0003
0.0657
0.0669
C
3.1928
0.0006
3.1916
3.1940
HIP
18
: HIP data with follow-up of 18 years.
Table B-2: Estimated model parameter ^
in the NIGN-LL model for the HIP
18
data.
Method:
NIGN-LL
Data:
HIP
18
^
Estimates
SE
LCL
UCL
T,c1
6.2597
0.1157
6.0331
6.4864
T,c0
6.1094
0.1134
5.8871
6.3317
T,n
6.1481
0.1233
5.9065
6.3898
T
1.5595
0.0479
1.4657
1.6533
C,c1
2.6173
0.0040
2.6094
2.6252
C,c0
2.6217
0.0041
2.6136
2.6298
C,n
2.3269
0.0055
2.3162
2.3376
C
0.5674
0.0000
0.5673
0.5674
HIP
18
: HIP data with follow-up of 18 years.
118

Table B-3: Estimated model parameter ^
in the NIGN-WW model for the HIP
7
data.
Method:
NIGN-WW
Data:
HIP
7
^
Estimates
SE
LCL
UCL
T,c1
0.0059
0.0014
0.0031
0.0087
T,c0
0.0092
0.0020
0.0053
0.0131
T,n
0.0069
0.0016
0.0037
0.0101
T
1.9362
0.1389
1.6640
2.2084
C,c1
0.1438
0.0001
0.1436
0.1440
C,c0
0.1446
0.0001
0.1444
0.1448
C,n
0.1452
0.0001
0.1450
0.1454
C
10.3780
0.0004
10.3770
10.3790
HIP
7
: HIP data with follow-up of 7 years.
Table B-4: Estimated model parameter ^
in the NIGN-LL model for the HIP
7
data.
Method:
NIGN-LL
Data:
HIP
7
^
Estimates
SE
LCL
UCL
T,c1
6.7587
0.3394
6.0935
7.4239
T,c0
6.2969
0.3073
5.6947
6.8992
T,n
6.6068
0.3393
5.9419
7.2718
T
1.6828
0.1134
1.4605
1.9052
C,c1
1.8900
0.0020
1.8861
1.8939
C,c0
1.8808
0.0017
1.8775
1.8840
C,n
1.7764
0.0034
1.7697
1.7831
C
0.2831
0.0000
0.2830
0.2832
HIP
7
: HIP data with follow-up of 7 years.
119

T
able
B-5:
Iterations
of
the
EM
algorithm
using
the
NIGN-WW
mo
del
for
the
HIP
18
data.
Mo
del:
NIGN-WW
Data:
HIP
18
Iter
k
(1)
T,
c
1
(1)
T,
c
0
(1)
T,
n
0
T
(1)
C,
c
1
(1)
C,
c
0
(1)
C,
n
0
C
Mflag
a
b
1
174.74
156.41
155.26
1.83
16.07
15.64
15.09
3.19
1
113.9300
2
174.75
156.20
155.52
1.83
16.07
15.64
15.09
3.19
1
0.2520
3
174.77
156.06
155.70
1.83
16.07
15.65
15.08
3.19
1
0.1870
4
174.77
155.96
155.82
1.83
16.07
15.65
15.08
3.19
1
0.1172
5
174.78
155.89
155.91
1.83
16.07
15.65
15.08
3.19
1
0.0887
6
174.78
155.85
155.96
1.83
16.07
15.65
15.08
3.19
1
0.0569
7
174.78
155.81
156.00
1.83
16.07
15.65
15.08
3.19
1
0.0390
8
174.78
155.79
156.03
1.83
16.07
15.65
15.08
3.19
1
0.0272
9
174.79
155.78
156.05
1.83
16.07
15.65
15.08
3.19
1
0.0173
10
174.78
155.77
156.06
1.83
16.07
15.65
15.08
3.19
1
0.0130
11
174.78
155.76
156.07
1.83
16.07
15.65
15.08
3.19
1
0.0094
12
174.78
155.76
156.07
1.83
16.07
15.65
15.08
3.19
1
0.0055
13
174.78
155.75
156.07
1.83
16.07
15.65
15.08
3.19
1
0.0052
14
174.78
155.75
156.08
1.83
16.07
15.65
15.08
3.19
1
0.0044
HIP
18
:
HIP
data
with
follo
w-up
of
18
y
ears.
a
Mfl
a
g
=
1
indicates
the
maxim
um
of
Q
(,
(k
)
is
reac
hed
in
the
k
th
iteration.
b
represen
ts
the
maxim
um
comp
onen
t
o
f
|
(k
+1)
-
(k
)
|,
the
absolute
v
ector
of
difference.
120

T
able
B-6:
Iterations
of
EM
algorithm
for
the
NIGN-LL
mo
del
on
the
HIP
18
data
Mo
del:
NIGN-LL
Data:
HIP
18
Iter
k
(1)
T,
c
1
(1)
T,
c
0
(1)
T,
n
0
(1)
C,
c
1
(1)
C,
c
0
(1)
C,
n
0
c
Mflag
a
b
1
6.2597
6.1094
6.1481
1.5595
2.6173
2.6217
2.3269
0.5674
1
0.0008
HIP
18
:
the
HIP
data
with
follo
wing-up
in
18
y
ears.
a
Mfl
a
g
=
1
indicates
the
maxim
um
of
Q
(,
(k
)
is
reac
hed
in
the
k
th
iteration.
b
represen
ts
the
maxim
um
comp
onen
t
o
f
|
(k
+1)
-
(k
)
|,
the
absolute
v
ector
of
difference.
121

T
able
B-7:
Iterations
of
the
EM
algorithm
for
the
NIGN-WW
mo
del
on
the
HIP
7
data
Mo
del:
NIGN-WW
Data:
HIP
7
Iter
(1)
T,
c
1
(1)
T,
c
0
(1)
T,
n
0
(1)
C,
c
1
(1)
C,
c
0
(1)
C,
n
0
c
Mflag
a
b
1
169.16
109.66
142.30
1.94
6.95
6.91
6.89
10.38
1
5984.8000
2
169.18
109.32
143.14
1.94
6.95
6.91
6.89
10.38
1
0.8389
3
169.19
109.10
143.67
1.94
6.95
6.91
6.89
10.38
1
0.5288
4
169.20
108.96
144.00
1.94
6.95
6.91
6.89
10.38
1
0.3367
5
169.21
108.88
144.23
1.94
6.95
6.91
6.89
10.38
1
0.2229
6
169.22
108.83
144.36
1.94
6.95
6.91
6.89
10.38
1
0.1329
7
169.22
108.80
144.45
1.94
6.95
6.91
6.89
10.38
1
0.0887
8
169.21
108.77
144.49
1.94
6.95
6.91
6.89
10.38
1
0.0405
9
169.22
108.76
144.53
1.94
6.95
6.91
6.89
10.38
1
0.0388
10
169.21
108.75
144.54
1.94
6.95
6.91
6.89
10.38
1
0.0166
11
169.22
108.75
144.56
1.94
6.95
6.91
6.89
10.38
1
0.0190
12
169.23
108.75
144.58
1.94
6.95
6.91
6.89
10.38
1
0.0162
13
169.22
108.74
144.58
1.94
6.95
6.91
6.89
10.38
1
0.0102
14
169.22
108.74
144.58
1.94
6.95
6.91
6.89
10.38
1
0.0045
HIP
7
:
HIP
data
with
follo
w-up
of
7
y
ears.
a
Mfl
a
g
=
1
indicates
the
maxim
um
of
Q
(,
(k
)
is
reac
hed
in
the
k
th
iteration.
b
represen
ts
the
maxim
um
comp
onen
t
o
f
|
(k
+1)
-
(k
)
|,
the
absolute
v
ector
of
difference.
122

T
able
B-8:
Iterations
of
the
EM
algorithm
for
the
NIGN-LL
mo
del
on
the
HIP
7
data
Mo
del:
NIGN-LL
Data:
HIP
7
Iter
k
(1)
T,
c
1
(1)
T,
c
0
(1)
T,
n
0
(1)
C,
c
1
(1)
C,
c
0
(1)
C,
n
0
c
Mflag
a
b
1
8.22
7.56
8.02
2.18
1.89
1.88
1.80
0.28
1
1.9602
2
6.76
6.29
6.62
1.68
1.89
1.87
1.78
0.28
1
1.4641
3
6.76
6.29
6.61
1.68
1.89
1.88
1.78
0.28
1
0.0068
4
6.76
6.30
6.61
1.68
1.89
1.88
1.78
0.28
1
0.0041
HIP
7
:
HIP
data
with
follo
w-up
of
7
y
ears.
a
Mfl
a
g
=
1
indicates
the
maxim
um
of
Q
(,
(k
)
is
reac
hed
in
the
k
th
iteration.
b
represen
ts
the
maxim
um
comp
onen
t
o
f
|
(k
+1)
-
(k
)
|,
the
absolute
v
ector
of
difference.
123

Table B-9: Estimated ACE(t)(95% CI) using the F-R method for the HIP
18
data.
Method:
F-R
Data:
HIP
18
Year
ACE
F R
SE
ace
LCL
ace
U CL
ace
Zvalue
p
- value
1
0.38E
-04 0.93E-04 -1.44E-04 2.20E-04 0.4126
0.3399
2
2.03E
-04 1.63E-04 -1.16E-04 5.23E-04 1.2464
0.1063
3
3.58E
-04 2.22E-04 -0.76E-04 7.93E-04 1.6160
0.0530
4
7.38E
-04 2.97E-04
1.56E
-04
1.32E
-03 2.4843
0.0065
5
12.1E
-04 3.92E-04
4.41E
-04
1.98E
-03 3.0855
0.0010
6
18.6E
-04 4.77E-04
9.23E
-04
2.79E
-03 3.8932
0.0000
7
17.1E
-04 5.30E-04
6.74E
-04
2.75E
-03 3.2317
0.0006
8
14.1E
-04 5.81E-04
2.71E
-04
2.55E
-03 2.4268
0.0076
9
17.3E
-04 6.49E-04
4.55E
-04
3.00E
-03 2.6614
0.0039
10
14.6E
-04 7.01E-04
0.82E
-04
2.83E
-03 2.0769
0.0189
11
17.3E
-04 7.58E-04
2.46E
-04
3.22E
-03 2.2844
0.0112
12
14.9E
-04 8.15E-04 -1.10E-04 3.08E-03 1.8249
0.0340
13
14.8E
-04 8.68E-04 -2.19E-04 3.18E-03 1.7081
0.0438
14
10.5E
-04 9.21E-04 -7.60E-04 2.85E-03 1.1350
0.1282
15
11.1E
-04 10.1E-04 -8.72E-04 3.09E-03 1.0963
0.1365
16
10.5E
-04 10.9E-04 -10.8E-04 3.19E-03 0.9683
0.1665
17
11.8E
-04 11.4E-04 -10.6E-04 3.42E-03 1.0354
0.1502
18
21.3E
-04 12.2E-04 -2.66E-04 4.52E-03 1.7426
0.0407
HIP
18
: HIP data with follow-up of 18 years
124

Table B-10: Estimated ACE(t) (95% CI) using the NIGN-WW model for the
HIP
18
data.
Method:
NIGN-WW
Data:
HIP
18
Year
ACE
nign
-ww
SE
ace
LCL
ace
U CL
ace
Zvalue
p
- value
1
0.13E
-04
0.07E
-04 -0.01E-04 0.26E-04 1.8643
0.0311
2
0.45E
-04
0.24E
-04 -0.02E-04 0.91E-04 1.8920
0.0292
3
0.94E
-04
0.49E
-04 -0.03E-04 1.91E-04 1.9030
0.0285
4
1.59E
-04
0.83E
-04 -0.04E-04 3.22E-04 1.9084
0.0282
5
2.38E
-04
1.25E
-04 -0.06E-04 4.83E-04 1.9111
0.0280
6
3.32E
-04
1.74E
-04 -0.08E-04 6.73E-04 1.9123
0.0279
7
4.40E
-04
2.30E
-04 -0.11E-04 8.91E-04 1.9127
0.0279
8
5.61E
-04
2.93E
-04 -0.14E-04 11.4E-04 1.9126
0.0279
9
6.95E
-04
3.63E
-04 -0.17E-04 14.1E-04 1.9121
0.0279
10
8.41E
-04
4.40E
-04 -0.21E-04 17.0E-04 1.9113
0.0280
11
10.0E
-04
5.23E
-04 -0.26E-04 20.3E-04 1.9104
0.0280
12
11.7E
-04
6.13E
-04 -0.31E-04 23.7E-04 1.9094
0.0281
13
13.5E
-04
7.09E
-04 -0.376E-04 27.4E-04 1.9083
0.0282
14
15.5E
-04
8.11E
-04 -0.43E-04 31.4E-04 1.9072
0.0282
15
17.5E
-04
9.19E
-04 -0.50E-04 35.5E-04 1.9061
0.0283
16
19.7E
-04
10.3E
-04 -0.57E-04 39.9E-04 1.9049
0.0284
17
21.9E
-04
11.5E
-04 -0.65E-04 44.5E-04 1.9037
0.0285
18
24.3E
-04
12.8E
-04 -0.73E-04 49.4E-04 1.9026
0.0285
HIP
18
: HIP data with follow-up of 18 years
125

Table B-11: Estimated ACE(t) (95% CI) using the NIGN-LL model for the HIP
18
data.
Method:
NIGN-LL
Data:
HIP
18
Year
ACE
nign
-ll
SE
ace
LCL
ace
U CL
ace
Zvalue
p
- value
1
0.10E
-04
0.05E
-04 0.01E-04 0.19E-04 2.1060
0.0176
2
0.53E
-04
0.23E
-04 0.07E-04 0.98E-04 2.2561
0.0120
3
1.27E
-04
0.55E
-04 0.20E-04 2.34E-04 2.3214
0.0101
4
2.27E
-04
0.96E
-04 0.38E-04 4.16E-04 2.3576
0.0092
5
3.49E
-04
1.47E
-04 0.62E-04 6.37E-04 2.3804
0.0086
6
4.89E
-04
2.04E
-04 0.89E-04 8.89E-04 2.3958
0.0083
7
6.42E
-04
2.67E
-04 1.19E-04 11.7E-04 2.4068
0.0080
8
8.08E
-04
3.34E
-04 1.52E-04 14.6E-04 2.4150
0.0079
9
9.82E
-04
4.06E
-04 1.87E-04 17.8E-04 2.4213
0.0077
10
11.6E
-04
4.80E
-04 2.24E-04 21.1E-04 2.4262
0.0076
11
13.5E
-04
5.57E
-04 2.62E-04 24.5E-04 2.4302
0.0075
12
15.5E
-04
6.36E
-04 3.01E-04 27.9E-04 2.4334
0.0075
13
17.5E
-04
7.16E
-04 3.41E-04 31.5E-04 2.4361
0.0074
14
19.5E
-04
7.98E
-04 3.82E-04 35.1E-04 2.4384
0.0074
15
21.5E
-04
8.81E
-04 4.23E-04 38.8E-04 2.4404
0.0073
16
23.6E
-04
9.65E
-04 4.65E-04 42.5E-04 2.4421
0.0073
17
25.6E
-04
10.5E
-04 5.07E-04 46.2E-04 2.4435
0.0073
18
27.7E
-04
11.3E
-04 5.49E-04 49.9E-04 2.4448
0.0072
HIP
18
: HIP data with follow-up of 18 years.
126

Table B-12: Estimated CACE(t) (95% CI) using the F-R method for the HIP
18
data.
Method:
F-R
Data:
HIP
18
Year
CACE
F R
SE
cace
LCL
cace
U CL
cace
Zvalue
p
- value
1
0.57E
-04 1.39E-04 -2.14E-04 3.29E-04 0.4127
0.3399
2
3.03E
-04 2.43E-04 -1.74E-04 7.80E-04 1.2464
0.1063
3
5.35E
-04 3.31E-04 -1.14E-04 11.8E-04 1.6159
0.0531
4
11.0E
-04 4.44E-04
2.32E
-04
19.7E
-04 2.4836
0.0065
5
18.1E
-04 5.85E-04
6.58E
-04
29.5E
-04 3.0846
0.0010
6
27.7E
-04 7.13E-04
1.38E
-03
41.7E
-04 3.8930
0.0000
7
25.6E
-04 7.91E-04
1.01E
-03
41.1E
-04 3.2321
0.0006
8
21.1E
-04 8.68E-04
4.05E
-04
38.1E
-04 2.4272
0.0076
9
25.8E
-04 9.69E-04
6.80E
-04
44.8E
-04 2.6616
0.0039
10
21.7E
-04 10.5E-04
1.23E
-04
42.3E
-04 2.0772
0.0189
11
25.9E
-04 11.3E-04
3.68E
-04
48.0E
-04 2.2849
0.0112
12
22.2E
-04 12.2E-04 -1.64E-04 46.1E-04 1.8252
0.0340
13
22.1E
-04 13.0E-04 -3.26E-04 47.5E-04 1.7082
0.0438
14
15.6E
-04 13.8E-04 -11.3E-04 42.6E-04 1.1349
0.1282
15
16.5E
-04 15.1E-04 -13.0E-04 46.1E-04 1.0962
0.1365
16
15.8E
-04 16.3E-04 -16.1E-04 47.6E-04 0.9682
0.1665
17
17.7E
-04 17.1E-04 -15.8E-04 51.1E-04 1.0353
0.1502
18
31.8E
-04 18.2E-04 -3.96E-04 67.5E-04 1.7427
0.0407
HIP
18
: HIP data with follow-up 18 years
127

Table B-13: Estimated CACE(t) (95% CI) using the NIGN-WW model for the
HIP
18
data.
Method:
NIGN-WW
Data:
HIP
18
Year
CACE
nign
-ww
SE
cace
LCL
cace
U CL
cace
Zvalue
p
- value
1
0.19E
-04
0.10E
-04 -0.01E-04 0.39E-04 1.8643
0.0311
2
0.67E
-04
0.35E
-04 -0.02E-04 1.36E-04 1.8921
0.0292
3
1.40E
-04
0.74E
-04 -0.04E-04 2.85E-04 1.9031
0.0285
4
2.37E
-04
1.24E
-04 -0.06E-04 4.80E-04 1.9084
0.0282
5
3.56E
-04
1.86E
-04 -0.09E-04 7.21E-04 1.9111
0.0280
6
4.96E
-04
2.59E
-04 -0.12E-04 10.0E-04 1.9124
0.0279
7
6.57E
-04
3.43E
-04 -0.16E-04 13.3E-04 1.9128
0.0279
8
8.37E
-04
4.38E
-04 -0.21E-04 17.0E-04 1.9126
0.0279
9
10.4E
-04
5.42E
-04 -0.26E-04 21.0E-04 1.9121
0.0279
10
12.6E
-04
6.57E
-04 -0.32E-04 25.4E-04 1.9114
0.0280
11
14.9E
-04
7.81E
-04 -0.39E-04 30.2E-04 1.9105
0.0280
12
17.5E
-04
9.15E
-04 -0.46E-04 35.4E-04 1.9095
0.0281
13
20.2E
-04
10.6E
-04 -0.55E-04 40.9E-04 1.9084
0.0282
14
23.1E
-04
12.1E
-04 -0.64E-04 46.8E-04 1.9073
0.0282
15
26.2E
-04
13.7E
-04 -0.74E-04 53.0E-04 1.9061
0.0283
16
29.4E
-04
15.4E
-04 -0.85E-04 59.6E-04 1.9049
0.0284
17
32.8E
-04
17.2E
-04 -0.97E-04 66.5E-04 1.9038
0.0285
18
36.3E
-04
19.1E
-04 -1.09E-04 73.7E-04 1.9026
0.0285
HIP
18
: HIP data with follow-up of 18 years.
128

Table B-14: Estimated CACE(t) (95% CI) using the NIGN-LL models for the
HIP
18
data.
Method:
NIGN-LL
Data:
HIP
18
Year
CACE
nign
-ll
SE
cace
LCL
cace
U CL
cace
Zvalue
p
- value
1
0.15E
-04
0.07E
-04 0.01E-04 0.29E-04 2.1061
0.0176
2
0.78E
-04
0.35E
-04 0.10E-04 1.47E-04 2.2562
0.0120
3
1.89E
-04
0.82E
-04 0.30E-04 3.49E-04 2.3215
0.0101
4
3.39E
-04
1.44E
-04 0.57E-04 6.21E-04 2.3577
0.0092
5
5.22E
-04
2.19E
-04 0.92E-04 9.51E-04 2.3805
0.0086
6
7.30E
-04
3.05E
-04 1.33E-04 13.3E-04 2.3959
0.0083
7
9.59E
-04
3.98E
-04 1.78E-04 17.4E-04 2.4069
0.0080
8
12.1E
-04
4.99E
-04 2.27E-04 21.8E-04 2.4151
0.0079
9
14.7E
-04
6.06E
-04 2.79E-04 26.5E-04 2.4214
0.0077
10
17.4E
-04
7.17E
-04 3.34E-04 31.4E-04 2.4263
0.0076
11
20.2E
-04
8.31E
-04 3.91E-04 36.5E-04 2.4303
0.0075
12
23.1E
-04
9.49E
-04 4.50E-04 41.7E-04 2.4336
0.0075
13
26.1E
-04
10.7E
-04 5.09E-04 47.0E-04 2.4363
0.0074
14
29.1E
-04
11.9E
-04 5.70E-04 52.4E-04 2.4386
0.0074
15
32.1E
-04
13.2E
-04 6.32E-04 57.9E-04 2.4405
0.0073
16
35.2E
-04
14.4E
-04 6.94E-04 63.4E-04 2.4422
0.0073
17
38.3E
-04
15.7E
-04 7.57E-04 68.9E-04 2.4437
0.0073
18
41.4E
-04
16.9E
-04 8.20E-04 74.5E-04 2.4449
0.0072
HIP
18
: HIP data with follow-up of 18 years.
129

Table B-15: Estimated ACE(t) (95% CI) using the F-R method for the HIP
7
data.
Method:
F-R
Data:
HIP
7
Year
ACE
F R
SE
ace
LCL
ace
U CL
ace
Zvalue
p
- value
1
0.38E
-04 0.93E-04 -1.44E-04 2.20E-04 0.4126
0.3399
2
2.03E
-04 1.63E-04 -1.16E-04 5.23E-04 1.2464
0.1063
3
3.58E
-04 2.22E-04 -0.76E-04 7.93E-04 1.6160
0.0530
4
7.38E
-04 2.97E-04
1.56E
-04
13.2E
-04 2.4843
0.0065
5
12.1E
-04 3.92E-04
4.41E
-04
19.8E
-04 3.0855
0.0010
6
18.7E
-04 4.74E-04
9.44E
-04
28.0E
-04 3.9507
0.0000
7
18.7E
-04 4.74E-04
9.44E
-04
28.0E
-04 3.9507
0.0000
HIP
7
: HIP data with follow-up of 7 years
Table B-16: Estimated CACE(t) (95% CI) using the F-R method for the HIP
7
data.
Method:
F-R
Data:
HIP
7
Year
CACE
F R
SE
cace
LCL
cace
U CL
cace
Zvalue
p
- value
1
0.57E
-04 1.39E-04 -2.14E-04 3.29E-04 0.4127
0.3399
2
3.03E
-04 2.43E-04 -1.74E-04 7.80E-04 1.2464
0.1063
3
5.35E
-04 3.31E-04 -1.14E-04 11.8E-04 1.6159
0.0531
4
11.0E
-04 4.44E-04
2.32E
-04
19.7E
-04 2.4836
0.0065
5
18.1E
-04 5.85E-04
6.58E
-04
29.5E
-04 3.0846
0.0010
6
28.0E
-04 7.08E-04
14.1E
-04
41.9E
-04 3.9505
0.0000
7
28.0E
-04 7.08E-04
14.1E
-04
41.9E
-04 3.9505
0.0000
HIP
7
: HIP data with follow-up of 7 years
130

Table B-17: Estimated ACE(t) (95% CI) using the NIGN-WW model for the
HIP
7
data.
Method:
NIGN-WW
Data:
HIP
7
Year
ACE
nign
-ww
SE
ace
LCL
ace
U CL
ace
Zvalue
p
- value
1
0.44E
-05
0.16E
-04 0.13E-04 0.75E-05 2.7777
0.0027
2
1.68E
-04
0.51E
-04 0.69E-04 2.67E-04 3.3248
0.0004
3
3.69E
-04
1.02E
-04 1.69E-04 5.68E-04 3.6237
0.0001
4
6.43E
-04
1.70E
-04 3.10E-04 9.76E-04 3.7839
0.0001
5
9.90E
-04
2.56E
-04 4.87E-04 14.9E-04 3.8593
0.0001
6
14.1E
-04
3.63E
-04 6.97E-04 21.2E-04 3.8822
0.0001
7
19.0E
-04
4.89E
-04 9.36E-04 28.5E-04 3.8729
0.0001
HIP
7
: HIP data with follow-up of 7 years.
Table B-18: Estimated ACE(t) (95% CI) using the NIGN-WW model for the
HIP
7
data.
Method:
NIGN-LL
Data:
HIP
7
Year
ACE
nign
-ll
SE
ace
LCL
ace
U CL
ace
Zvalue
p
- value
1
0.41E
-04
0.16E
-04 0.09E-04 0.74E-04 2.5238
0.0058
2
1.86E
-04
0.58E
-04 0.73E-04 2.99E-04 3.2280
0.0006
3
4.15E
-04
1.15E
-04 1.89E-04 6.41E-04 3.5969
0.0002
4
7.07E
-04
1.86E
-04 3.42E-04 10.7E-04 3.7917
0.0001
5
10.5E
-04
2.69E
-04 5.20E-04 15.8E-04 3.8924
0.0000
6
14.3E
-04
3.62E
-04 7.18E-04 21.4E-04 3.9413
0.0000
7
18.4E
-04
4.64E
-04 9.28E-04 27.5E-04 3.9612
0.0000
HIP
7
: HIP data with follow-up of 7 years
131

Table B-19: Estimated CACE(t) (95% CI) using the NIGN-WW model for the
HIP
7
data.
Method:
NIGN-WW
Data:
HIP
7
Year
CACE
nign
-ww
SE
cace
LCL
cace
U CL
cace
Zvalue
p
- value
1
0.66E
-04
0.24E
-04 0.19E-04 1.12E-04 2.7779
0.0027
2
2.51E
-04
0.76E
-04 1.03E-04 3.99E-04 3.3251
0.0004
3
5.50E
-04
1.52E
-04 2.53E-04 8.48E-04 3.6241
0.0001
4
9.60E
-04
2.54E
-04 4.63E-04 14.6E-04 3.7843
0.0001
5
14.8E
-04
3.83E
-04 7.27E-04 22.3E-04 3.8597
0.0001
6
21.0E
-04
5.41E
-04 10.4E-04 31.6E-04 3.8827
0.0001
7
28.3E
-04
7.31E
-04 14.0E-04 42.6E-04 3.8734
0.0001
HIP
7
: HIP data with follow-up of 7 years
Table B-20: Estimated CACE(t) (95% CI) using the the NIGN-LL model for the
HIP
7
data.
Method:
NIGN-LL
Data:
HIP
7
Year
CACE
nign
-ll
SE
cace
LCL
cace
U CL
cace
Zvalue
p
- value
1
0.62E
-04
0.25E
-04 0.14E-04 1.10E-04 2.5240
0.0058
2
2.78E
-04
0.86E
-04 1.09E-04 4.47E-04 3.2283
0.0006
3
6.19E
-04
1.72E
-04 2.82E-04 9.57E-04 3.5972
0.0002
4
10.6E
-04
2.78E
-04 5.10E-04 16.0E-04 3.7921
0.0001
5
15.7E
-04
4.02E
-04 7.77E-04 23.5E-04 3.8929
0.0000
6
21.3E
-04
5.41E
-04 10.7E-04 31.9E-04 3.9418
0.0000
7
27.4E
-04
6.92E
-04 13.9E-04 41.0E-04 3.9617
0.0000
HIP
7
: HIP data with follow-up of 7 years
132

Table B-21: Estimated survival values using the non-ignorable PPO survival mod-
els for the HIP
18
data.
Data:
HIP
18
Model:
NIGN-WW
NIGN-LL
Years
S
c1
(t)
S
c0
(t)
S
n
(t)
S
c1
(t)
S
c0
(t)
S
n
(t)
1
0.9999
0.9999
0.9999
1.0000
1.0000
1.0000
2
0.9997
0.9997
0.9997
0.9998
0.9997
0.9998
3
0.9994
0.9993
0.9993
0.9995
0.9993
0.9994
4
0.9990
0.9988
0.9988
0.9991
0.9988
0.9989
5
0.9985
0.9981
0.9981
0.9986
0.9980
0.9982
6
0.9979
0.9974
0.9974
0.9979
0.9972
0.9974
7
0.9972
0.9965
0.9966
0.9972
0.9962
0.9965
8
0.9964
0.9956
0.9956
0.9963
0.9951
0.9955
9
0.9956
0.9945
0.9945
0.9954
0.9939
0.9943
10
0.9946
0.9934
0.9934
0.9944
0.9927
0.9932
11
0.9936
0.9921
0.9921
0.9934
0.9913
0.9919
12
0.9925
0.9908
0.9908
0.9922
0.9899
0.9906
13
0.9913
0.9893
0.9893
0.9911
0.9885
0.9892
14
0.9901
0.9878
0.9878
0.9899
0.9870
0.9878
15
0.9888
0.9861
0.9862
0.9886
0.9854
0.9863
16
0.9874
0.9844
0.9845
0.9873
0.9838
0.9848
17
0.9859
0.9826
0.9827
0.9860
0.9822
0.9832
18
0.9843
0.9807
0.9808
0.9846
0.9805
0.9817
Table B-22: Estimated survival values using the non-ignorable PPO survival mod-
els for the HIP
7
data.
Data:
HIP
7
Model:
NIGN-WW
NIGN-LL
Years
S
c1
(t)
S
c0
(t)
S
n
(t)
S
c1
(t)
S
c0
(t)
S
n
(t)
1
1.0000
0.9999
0.9999
1.0000
0.9999
1.0000
2
0.9998
0.9996
0.9998
0.9998
0.9996
0.9998
3
0.9996
0.9990
0.9994
0.9996
0.9990
0.9995
4
0.9993
0.9983
0.9990
0.9993
0.9982
0.9990
5
0.9989
0.9974
0.9985
0.9989
0.9973
0.9985
6
0.9984
0.9963
0.9979
0.9984
0.9963
0.9979
7
0.9979
0.9951
0.9972
0.9979
0.9951
0.9972
133

Table B-23: Estimated survival values using the F-R method for the HIP
18
data.
Data:
HIP
18
Model:
F-R
Years
S
c1
(t)
S
c0
(t)
S
n
(t)
1
0.9999
0.9998
0.9998
2
0.9997
0.9993
0.9996
3
0.9994
0.9989
0.9992
4
0.9991
0.9980
0.9985
5
0.9985
0.9967
0.9977
6
0.9979
0.9951
0.9969
7
0.9971
0.9945
0.9959
8
0.9960
0.9939
0.9949
9
0.9950
0.9924
0.9941
10
0.9940
0.9918
0.9930
11
0.9932
0.9907
0.9916
12
0.9922
0.9900
0.9896
13
0.9910
0.9888
0.9879
14
0.9898
0.9883
0.9860
15
0.9887
0.9870
0.9841
16
0.9872
0.9857
0.9825
17
0.9857
0.9840
0.9816
18
0.9845
0.9813
0.9809
Table B-24: Estimated survival values using the F-R method for the HIP
7
data.
Data:
HIP
18
Model:
F-R
Years
S
c1
(t)
S
c0
(t)
S
n
(t)
1
0.9999
0.9998
0.9998
2
0.9997
0.9993
0.9996
3
0.9994
0.9989
0.9992
4
0.9991
0.9980
0.9985
5
0.9985
0.9967
0.9977
6
0.9979
0.9951
0.9972
7
0.9979
0.9951
0.9972
134

Table B-25: The ITT estimated survival values using the non-ignorable PPO
survival models for the HIP
18
data.
Data:
HIP
18
Model:
NIGN-WW
NIGN-LL
Years
S
1
(t)
S
0
(t)
S
1
(t)
S
0
(t)
1
0.9999
0.9999
1.0000
1.0000
2
0.9997
0.9997
0.9998
0.9998
3
0.9994
0.9993
0.9995
0.9994
4
0.9989
0.9988
0.9990
0.9988
5
0.9984
0.9981
0.9984
0.9981
6
0.9977
0.9974
0.9977
0.9972
7
0.9970
0.9965
0.9969
0.9963
8
0.9961
0.9956
0.9960
0.9952
9
0.9952
0.9945
0.9951
0.9941
10
0.9942
0.9934
0.9940
0.9928
11
0.9931
0.9921
0.9929
0.9915
12
0.9919
0.9908
0.9917
0.9902
13
0.9907
0.9893
0.9905
0.9887
14
0.9893
0.9878
0.9892
0.9872
15
0.9879
0.9862
0.9879
0.9857
16
0.9864
0.9844
0.9865
0.9841
17
0.9848
0.9826
0.9851
0.9825
18
0.9832
0.9808
0.9837
0.9809
Table B-26: The ITT estimated survival values using the non-ignorable PPO
survival models for the HIP
7
data.
Data:
HIP
7
Model:
NIGN-WW
NIGN-LL
Years
S
1
(t)
S
0
(t)
S
1
(t)
S
0
(t)
1
1.0000
0.9999
1.0000
0.9999
2
0.9998
0.9996
0.9998
0.9996
3
0.9996
0.9992
0.9996
0.9991
4
0.9992
0.9986
0.9992
0.9985
5
0.9988
0.9978
0.9988
0.9977
6
0.9983
0.9969
0.9982
0.9968
7
0.9977
0.9958
0.9977
0.9958
135

Table B-27: The ITT estimated survival values using the F-R method for the
HIP
18
data.
Data:
HIP
18
Model: F-R
Years
S
1
(t)
S
0
(t)
1
0.9999
0.9998
2
0.9996
0.9994
3
0.9994
0.9990
4
0.9989
0.9982
5
0.9983
0.9970
6
0.9976
0.9957
7
0.9967
0.9950
8
0.9956
0.9942
9
0.9947
0.9930
10
0.9937
0.9922
11
0.9927
0.9910
12
0.9914
0.9899
13
0.9900
0.9885
14
0.9886
0.9875
15
0.9872
0.9861
16
0.9857
0.9846
17
0.9843
0.9832
18
0.9833
0.9812
Table B-28: The ITT estimated survival values using the F-R method for the
HIP
7
data.
Data:
HIP
7
Model: F-R
Years
S
1
(t)
S
0
(t)
1
0.9999
0.9998
2
0.9996
0.9994
3
0.9994
0.9990
4
0.9989
0.9982
5
0.9983
0.9971
6
0.9977
0.9958
7
0.9977
0.9958
136

Appendix C
Calculations of standard error (SE) of
CACE(t) and ACE(t)
As noted in Section 3, estimating CACE(t) and ACE(t) rather than esti-
mating the model parameters
T,uz
may be significantly more valuable and more
interpretable. The standard errors corresponding to the estimated CACE(t) and
ACE(t) have not been derived to date, to the best of the author's knowledge.
We now consider the derivation of formulae for calculating the standard errors
of the estimated CACE(t) and ACE(t) via the estimated model parameters
T,uz
.
Recall that CACE(t) was defined as the difference in survival values between
compliers in the treated group and the control group
By following
the notation of the customary survival function (for subgroups), CACE(t) can
thus be re-written as follows:
CACE(t) = S
c1
(t;
T,c1
)
- S
c0
(t;
T,c0
).
(C-1)
According to the property of variance of linear combinations (Theorem 4.5.6,
p.171, Casella & Berger (2002)), the variance of CACE(t) can be easily expressed
as:
V ar(CACE(t)) = V ar(S
c1
(t;
T,c1
)) + V ar(S
c0
(t;
T,c0
))
- 2Cov(S
c1
(t;
T,c1
), S
c0
(t;
T,c0
)),
(C-2)
where S
cz
(t;
T,cz
) (for z = 1 or 0) denotes the survival function at time t associ-
ated with the subgroup of individuals who have compliance type U
i
= c and who
137

have been assigned into the group of Z
i
= z (z = 1 for the treated group and
z = 0 for the control group).
Note that except for the exponential distribution, most common parametric
models for survival outcomes, such as the Weibull, Gamma, log-normal, and the
log-logistic models have two parameters (Klein, 1997; Ibrahim et al., 2001). Let
T,uz
= (
(1)
T,uz
,
(2)
T,uz
), and we treat V ar(S
uz
(t;
T,uz
)) as the variance of a function
of two variables (
(1)
T,uz
,
(2)
T,uz
). Then the Cov(S
c1
(t;
T,c1
), S
c0
(t;
T,c0
)) has to be
viewed as the covariance function of two variables (
(1)
T,uz
,
(2)
T,uz
). As such, we deal
with the variance and covariance of functions of multiple random variables. The
method (Klein, 1997) (see below) can then be used to derive formulae for such
approximations.
The method
The method, also known as the method of statistical
differential (Klein, 1997), is based on a Taylor series expansion of a continuous
function of the MLEs of a vector of parameters (
(1)
T,uz
,
(2)
T,uz
) for u = c and v = 1
or 0. For convenience, let f (x, y) denote a function of two variables, e.g. x and y;
and suppose that at the point (x
0
, y
0
), the first partial derivative of f (x, y) with
respect to x and y exist. These are denoted by f
x
(x
0
, y
0
) and f
y
(x
0
, y
0
). Then the
first order Taylor series expansion at the point (x
0
, y
0
) has the form as follows:
f (x, y) = f (x
0
, y
0
) + f
x
(x
0
, y
0
)(x
- x
0
) + f
y
(x
0
, y
0
)(y
- y
0
).
(C-3)
If x and y are both random variables with mean values, ¯
x and ¯
y, then setting
x
0
= ¯
x and y
0
= ¯
y, allows equation (C-3) to be expressed as:
f (x, y) = f (¯
x, ¯
y) + f
x
x, ¯
y)(x
- ¯x) + f
y
x, ¯
y)(y
- ¯y).
(C-4)
138

Variance of the function of two variables
From equation (C-4), it is
straightforward to observe that the function of two random variables, f (x, y),
has the mean value f (¯
x, ¯
y), as the last two terms in equation (C-4) are zeros.
By following the properties of variance (Theorem 4.5.6, p.171 Casella & Berger
(2002)), we have:
V ar(f (x, y)) = f
x
x, ¯
y)
2
V ar(x) + f
y
x, ¯
y)
2
V ar(y)
+ 2f
x
x, ¯
y)f
y
x, ¯
y)Cov(x, y),
(C-5)
because variances for constants are zeros, we have V ar(f (¯
x, ¯
y)) = 0, V ar(¯
x) = 0
and V ar(¯
y) = 0.
By following equation (C-5), we can write:
V ar(S
uz
(t;
T,uz
)) = V ar(S
uz
(t;
(1)
T,uz
,
(2)
T,uz
))
=
S
uz
(t;
(1)
T,uz
,
(2)
T,uz
)
(1)
T,uz
2
(
(1)
T,uz
,
(2)
T,uz
)
var(
(1)
T,uz
)
+
S
uz
(t;
(1)
T,uz
,
(2)
T,uz
)
(2)
T,uz
2
(
(1)
T,uz
,
(2)
T,uz
)
var(
(2)
T,uz
)
(C-6)
+
S
uz
(t;
(1)
T,uz
,
(2)
T,uz
)
(1)
T,uz
(
(1)
T,uz
,
(2)
T,uz
)
S
uz
(t;
(1)
T,uz
,
(2)
T,uz
)
(2)
T,uz
(
(1)
T,uz
,
(2)
T,uz
)
cov(
(1)
T,uz
,
(2)
T,uz
).
The approximate formula of V ar(S
u
(v, t;
(1)
T,uz
,
(2)
T,uz
)), the variance of the survival
function, can then be expressed by equation (C-6).
Covariance of functions of two variables
Similarly, if h(u, v) denotes an-
other function of two random variables u and v, with mean values ¯
u and ¯
v, then the
function h(u, v) has a mean value h(¯
u, ¯
v).
By the definition of covariance, the covariance between f (x, y) and h(u, v) is then
139

the expectation of the product of two factors (f (x, y)
- f(¯x, ¯y)) and (h(u, v) - h(¯u, ¯v));
that is:
Cov(f (x, y), h(u, v)) = E
{(f(x, y) - f(¯x, ¯y))(h(u, v) - h(¯u, ¯v))}.
(C-7)
By first order Taylor series expansion, equation (C-7) can be expressed by first order
partial derivative values at the mean value points, i.e.:
Cov(f (x, y), h(u, v))
= E
{f
x
x, ¯
y)(x
- ¯x) + f
y
x, ¯
y)(y
- ¯y))(h
u
u, ¯
v)(u
- ¯u) + h
v
u, ¯
v)(v
- ¯v))}.
(C-8)
Furthermore, the covariance between f (x, y) and h(u, v) can be simplified to
Cov(f (x, y), h(u, v)) = f
x
x, ¯
y)h
u
u, ¯
v)Cov(x, u) + f
y
x, ¯
y)h
u
u, ¯
v)Cov(y, u)
+ f
x
x, ¯
y)h
v
u, ¯
v)Cov(x, v) + f
y
x, ¯
y)h
v
u, ¯
v)Cov(y, v).
(C-9)
By following equation (C-9), the covariance between survival functions, say S
c1
(t;
T,c1
)
and S
c0
(t;
T,c0
), can then be expressed as follows:
Cov(S
c1
(t;
T,c1
), S
c0
(t;
T,c0
)) =
S
c1
(t;
(1)
T,c1
,
(2)
T,c1
)
(1)
T,c1
(
(1)
T,c1
,
(2)
T,c1
)
S
c0
(t;
(1)
T,c0
,
(2)
T,c0
)
(1)
T,c0
(
(1)
T,c0
,
(2)
T,c0
)
cov(
(1)
T,c1
,
(1)
T,c0
)
+
S
c1
(t;
(1)
T,c1
,
(2)
T,c1
)
(2)
T,c1
(
(1)
T,c1
,
(2)
T,c1
)
S
c0
(t;
(1)
T,c0
,
(2)
T,c0
)
(1)
T,c0
(
(1)
T,c0
,
(2)
T,c0
)
cov(
(2)
T,c1
,
(1)
T,c0
)
+
S
c1
(t;
(1)
T,c1
,
(2)
T,c1
)
(1)
T,c1
(
(1)
T,c1
,
(2)
T,c1
)
S
c0
(t;
(1)
T,c0
,
(2)
T,c0
)
(2)
T,c0
(
(1)
T,c0
,
(2)
T,c0
)
cov(
(1)
T,c1
,
(2)
T,c0
)
+
S
c1
(t;
(1)
T,c1
,
(2)
T,c1
)
(2)
T,c1
(
(1)
T,c1
,
(2)
T,c1
)
S
c0
(t;
(1)
T,c0
,
(2)
T,c0
)
(2)
T,c0
(
(1)
T,c0
,
(2)
T,c0
)
cov(
(2)
T,c1
,
(2)
T,c0
).
(C-10)
140

Equations (C-6) and (C-10) both can be simplified via vector operation as follows:
V ar (S
cz
(t;
T,cz
))
=
S
cz
(t;
T,cz
)
(1)
T,cz
,
S
cz
(t;
T,cz
)
(2)
T,cz
1
S
cz
(t;
T,cz
)
(1)
T,cz
,
S
cz
(t;
T,cz
)
(2)
T,cz
T
,
(C-11)
and
Cov (S
c1
(t;
T,c1
), S
c0
(t;
T,c0
))
=
S
c1
(t;
T,c1
)
(1)
T,c1
,
S
c1
(t;
T,c1
)
(2)
T,c1
2
S
c0
(t;
T,c0
)
(1)
T,c0
,
S
c0
(t;
T,c0
)
(2)
T,c0
;
T
(C-12)
where
1
and
2
represent the covariance matrix of all relevant parameters. More
specifically,
1
and
2
can be written as:
1
=
var(
(1)
T,cz
)
cov(
(1)
T,cz
,
(2)
T,cz
)
cov(
(2)
T,cz
,
(1)
T,cz
)
var(
(2)
T,cz
)
(C-13)
and
2
=
cov(
(1)
T,c1
,
(1)
T,c0
)
cov(
(1)
T,c1
,
(2)
T,c0
)
cov(
(2)
T,c1
,
(1)
T,c0
)
cov(
(2)
T,c1
,
(2)
T,c0
)
.
(C-14)
Then the standard error of CACE(t) can be calculated by taking the square root
of the variance:
SE
CACE(t)
=
V ar(CACE(t)).
(C-15)
Since we have ACE(t) =
c
CACE(t), the variance of ACE(t) can then be calculated
141

by the following equation:
V ar(ACE(t)) = V ar(
c
CACE(t))
= V ar (
c
S
c1
(t;
T,c1
)
-
c
S
c0
(t;
T,c0
))
= V ar (
c
S
c1
(t;
T,c1
)) + V ar (
c
S
c0
(t;
T,c0
))
- 2cov(
c
S
c1
(t;
T,c1
),
c
S
c0
(t;
T,c0
)).
(C-16)
Note that as
T,c1
= (
1
T,c1
,
2
T,c1
) and
T,c0
= (
1
T,c0
,
2
T,c0
), the two terms in the last
line of equation (C-16), namely
c
S
c1
(t;
T,c1
) and
c
S
c0
(t;
T,c0
), can both be viewed
as functions of three variables, which are (
c
,
1
T,c1
,
2
T,c1
) (in
c
S
c1
(t;
T,c1
)) and (
c
,
1
T,c0
,
2
T,c0
) in
c
S
c0
(t;
T,c0
).
By following the logic used to derive the formulations of the variance (equation
(C-11)) and covariance (equation (C-12)) of the survival function, (S
uz
(t;
T,uz
) (for
z = 1 and 0)), the variance of
c
S
uz
(t;
T,uz
), denoted by V ar(
c
S
uz
(t;
T,uz
)), can then
be expressed as follows:
V ar (
c
S
cz
(t;
T,uz
))
=
S
cz
(t;
T,cz
),
c
S
cz
(t;
T,cz
)
(1)
T,cz
,
c
S
cz
(t;
T,cz
)
(2)
T,cz
3
S
cz
(t;
T,cz
),
c
S
cz
(t|
T,cz
)
(1)
T,cz
,
c
S
cz
(t;
T,cz
)
(2)
T,cz
T
,
(C-17)
where
3
is given by:
3
=
var(
(1)
T,cz
)
cov(
(1)
T,cz
,
(2)
T,cz
)
cov(
(1)
T,cz
,
c
)
cov(
(2)
T,cz
,
(1)
T,cz
)
var(
(2)
T,cz
)
cov(
(2)
T,cz
,
c
)
cov(
c
,
(1)
T,cz
)
cov(
c
,
(2)
T,cz
)
var(
c
,
c
)
.
(C-18)
Thus the covariance matrix of
c
and
T,cz
and the covariance of these products cov(
c
S
c1
(t),
c
S
c0
(t))
142

can be written as:
Cov (
c
S
c1
(t;
T,c1
),
c
S
c0
(t;
T,c0
))
=
c
S
c1
(t;
T,c1
)
(1)
T,c1
,
c
S
c1
(t;
T,c1
)
(2)
T,c1
, S
c1
(t)
4
c
S
c0
(t;
T,c0
)
(1)
T,c0
,
c
S
c0
(t;
T,c0
)
(2)
T,c0
, S
c0
(t)
T
(C-19)
where
4
, the covariance matrix of relevant parameters, is given by:
4
=
cov(
(1)
T,c1
,
(1)
T,c0
)
cov(
(1)
T,c1
,
(2)
T,c0
)
cov(
(1)
T,c1
,
c
)
cov(
(2)
T,c1
,
(1)
T,c0
)
cov(
(2)
T,c1
,
(2)
T,c0
)
cov(
(2)
T,c1
,
c
)
cov(
c
,
(1)
T,c0
)
cov(
c
,
(2)
T,c0
)
var(
c
,
c
)
.
(C-20)
Therefore, (C-16) can be re-written as:
V ar(ACE(t))
=
c
S
c1
(t;
T,c1
)
(1)
T,c1
,
c
S
c1
(t;
T,c1
)
(2)
T,c1
, S
c1
(t;
T,c1
)
31
c
S
c1
(t;
T,c1
)
(1)
T,c1
,
c
S
c1
(t;
T,c1
)
(2)
T,c1
, S
c1
(t;
T,c1
)
T
+
c
S
c0
(t;
T,c0
)
(1)
T,c0
,
c
S
c0
(t;
T,c0
)
(2)
T,c0
, S
c0
(t;
T,c0
)
30
c
S
c0
(t|
T,c0
)
(1)
T,c0
,
c
S
c0
(t;
T,c0
)
(2)
T,c0
, S
c0
(t;
T,c0
)
T
-2
c
S
c1
(t;
T,c1
)
(1)
T,c1
,
c
S
c1
(t;
T,c1
)
(2)
T,c1
, S
c1
(t;
T,c1
)
4
c
S
c0
(t;
T,c0
)
(1)
T,c0
,
c
S
c0
(t;
T,c0
)
(2)
T,c0
, S
c0
(t;
T,c0
)
T
(C-21)
where
31
and
30
are given by equation (C-18) with z = 1 and 0, respectively.
i.e.:
31
=
var(
(1)
T,c1
)
cov(
(1)
T,c1
,
(2)
T,c1
) cov(
(1)
T,c1
,
c
)
cov(
(2)
T,c1
,
(1)
T,c1
)
var(
(2)
T,c1
)
cov(
(2)
T,c1
,
c
)
cov(
c
,
(1)
T,c1
)
cov(
c
,
(2)
T,c1
)
var(
c
,
c
)
(C-22)
and
30
=
var(
(1)
T,c0
)
cov(
(1)
T,c0
,
(2)
T,c0
) cov(
(1)
T,c0
,
c
)
cov(
(2)
T,c1
,
(1)
T,c0
)
var(
(2)
T,c0
)
cov(
(2)
T,c0
,
c
)
cov(
c
,
(1)
T,c0
)
cov(
c
,
(2)
T,c0
)
var(
c
,
c
)
.
(C-23)
143

Recall that
4
is given in (C-20). All these covariance matrices are then available
for our PPO-survival models. Consequently, it is straightforward to obtain the
standard error of ACE(t) by taking the square root of its variance V ar(ACE(t),
that is:
SE
ACE(t)
=
V ar(ACE(t)).
(C-24)
144

Appendix D
Correction of equation (3.2') in Louis (1982)
The likelihood function can be expressed as the product of each individual's
likelihood function L
F
() =
n
i=1
L
F
i
(). After taking the logarithm of the com-
plete data likelihood L
F
(), the observed information matrix (OIM), I
obs
, (given
in equation (8.17) in Section 8.4.1) can be re-written as below:
I
obs
= I( ^
) = I
1
- I
2
where
I
1
= E
-
n
i=1
2
2
logL
F
i
()
|u {c, a, n},
k
,
I
2
= E
n
i=1
logL
F
i
()
T
n
j=1
logL
F
j
()
|u {c, a, n},
k
.
(D-25)
Note that the descriptive index with respect to the summation in the second term
in equation (D-25) above are not the same, (denoted by i and j) here. By dividing
the so-called index set, 1) i = j; 2) i = j, we can re-write the term, I
2
= E
1
+ E
2
,
which is equivalent to:
E
n
i=1
logL
F
i
()
T
n
j=1
logL
F
j
()
|u {a, c, n},
k
= E
1
+ E
2
where
E
1
=
i
E
logL
F
i
()
T
logL
F
i
()
|u {a, c, n},
k
E
2
=
i=j
E
logL
F
i
()
T
|u {a, c, n},
k
E
logL
F
j
()
|u {a, c, n},
k
.
(D-26)
145

Note that in the first term in equation (D-26), the summation is overall obser-
vations (for i = 1 to n); whereas in the second term, it is only a summation for
i = j.
Suppose that the parameter is a vector with p dimensions. Then
logL
i
()
denotes a row vector with p components, for different individuals, say the i
th
and j
th
, in general,
logL
F
i
() =
logL
F
j
(). Therefore, their corresponding
expected vectors are not the same, that is:
E
logL
F
i
()
|u {c, n, a},
k
= E
logL
F
j
()
|u {c, n, a},
k
.
Consequently, it is easy to produce:
E
logL
F
i
()
T
|u {c, n, a},
k
E
logL
F
j
()
|u {c, n, a},
k
= E
logL
F
j
()
T
|u {c, n, a},
k
E
logL
F
i
()
|u {c, n, a},
k
,
(D-27)
which implies that the product of E
logL
F
i
()
T
|u {c, n, a},
k
and
E
logL
F
j
()
|u {c, n, a},
k
are not exchangeable.
Hence, we give the corrected form of equation (3.2') in Louis (1982) as follows:
I
obs
= I( ^
) = I
1
- E
1
- E
2
where
I
1
=
n
i=1
E
-
2
2
logL
i
()
|u {c, n, a},
k
E
1
=
i
E
logL
F
i
()
T
logL
F
i
()
|u {c, n, a},
k
E
2
=
i=j
E
logL
F
i
()
T
|u {c, n, a},
k
E
logL
F
j
()
|u {c, n, a},
k
.
(D-28)
Note that equation (D-28) guarantees that the OIM, I
obs
, is a p
× p symmetric,
146

but Louis's equation (3.2') in Louis (1982) cannot yield a symmetric matrix even
though the author had thought I
obs
should be symmetric (Louis, 1982). The
distinction between these two equations, namely Louis's equation (3.2') in Louis
(1982) and its corrected form equation (D-28) given above, are that the last term
in the former, given as:
2
i<j
E
logL
F
i
()
T
|u {c, n, a},
k
E
logL
F
j
()
|u {c, n, a},
k
,
is not symmetric as the summation is only for i < j rather than overall observa-
tions as in equation (D-28), where the last term is a symmetric matrix. This is
because that after taking summation with respect to all observations both for i
and j but excluding i = j,
i=j
E
logL
F
i
()
T
|u {c, n, a},
k
E
logL
F
j
()
|u {c, n, a},
k
is a symmetric matrix, even though matrix E
[logL
F
i
()]
T
E
[logL
F
j
()]
is
not symmetric for i = j.
In other words, as E
[logL
F
i
()]
is not a scalar, we cannot follow the
rule of ab + ba = 2ab. This is the key error in equation (3.2') of Louis (1982).
147

Appendix E
An AFT property with its proof
In this Appendix, we propose and prove Theorem 1, a useful property of
AFT for the Weibull or log-normal distribution only. However, it is not hard to
show that by following Theorem 1,it is always true in our PPO-survival models:
each subgroup shares the shape parameter (for Weibull model) or the covariance
parameter (for log-normal model).
Theorem 1 Suppose a failure time variable T satisfies an accelerated failure time
(AFT) model T = T
0
exp( X), where T
0
represents the failure time corresponding
to the baseline; X and denote covariate vector and its associated regression
coefficients, both of them are column vectors with the same length p.
1. If T
0
= t
0
Weib(
0
,
0
), t
0
follows a Weibull distribution with scale
parameter
0
and shape parameter
0
, the new variable T = t then fol-
lows a Weibull distribution with =
0
exp(
- X) and =
0
, namely
t
Weib(, ).
2. If T
0
= t
0
Logn(
0
,
0
), t
0
follows a log-normal distribution with two
parameters,
0
and
0
, the mean and variance of log(T
0
), the new variable
T = t then follows a log-normal distribution with =
0
+ X and =
0
,
i.e. T = t
Logn(, ).
Proof 1 By following the definition of the survival function with respect to the
148

failure time T , the survival function can be expressed as follows:
S(t) = P r(T > t) = P r(T
0
exp( X) > t);
T = T
0
exp( X);
= P r(T
0
> t exp(
- X));
exp( X) > 0;
= S
0
(t exp(
- X)); by the definition of S
0
(t).
(E-29)
In addition, by the definition of a probability density function (p.d.f.), we have
the following:
f (t) =
-
dS(t)
dt
;
(by definition);
=
dS
0
(t
0
)
dt
0
dt
0
dt
;
for t
0
= t exp(
- X)
= f
0
(t
0
) exp(
- X).
(E-30)
1)
T
0
= t
0
Weib(
0
,
0
), we have
f
w0
(t) =
0
0
(
0
t)
(
0
-1)
exp(
-(
0
t)
0
),
(E-31)
S
w0
(t) = exp(
-(
0
t)
0
).
(E-32)
where f
w0
(t) and S
w0
(t) denote the probability density (p.d.f.) and survival func-
tions (S) with respect to T
0
.
By following equations (E-29) and (E-30), the survival function and the proba-
bility density function (p.d.f.), f
w
(t), with respect to the failure time T in a Weibull
model, can be expressed as:
S
w
(t) = P r(T > t) = S
w0
(t exp(
- X));
= exp(
-[
0
t exp(
- X)]
0
),
(E-33)
149

and the probability density function (p.d.f.), f
w
(t) as:
f
w
(t) = f
w0
(t exp(
- X)) exp(- X);
=
0
0
(
0
t exp(
- X))
(
0
-1)
exp(
-[
0
t exp(
- X)]
0
) exp(
- X);
=
0
0
exp(
- X)(
0
t exp(
- X))
(
0
-1)
exp(
-[
0
t exp(
- X)]
0
). (E-34)
Let =
0
exp(
- X) and =
0
. Equations (E-35) and (E-35) can then be
re-written as:
S
w
(t) = P r(T > t) = exp(
-[
0
t exp(
- X)]
0
)
= exp(
-(t)
),
(E-35)
and
f
w
(t) = =
0
0
exp(
- X)(
0
exp(
- X)t)
(
0
-1)
exp(
-[
0
exp(
- X)t]
0
);
= (t)
(-1)
exp(
-(t)
).
(E-36)
Because equations (E-35) or (E-36) hold, we have T = t
Weib(, ) where
=
0
exp(
- X) and =
0
.
2)
T
0
= t
0
Logn(
o
,
0
), we can write:
f
l0
(t
0
) =
exp[
-
1
2
(
log[t
0
]-
0
0
)
2
]
t
0
(2)
1/2
0
= (
log[t
0
]
-
)/t
0
;
(E-37)
where () represents the standard normal probability density function, which can
be written as (x) =
1
2
exp(
-
x
2
2
); and
S
l0
(t) = 1
- (
log[t]
-
0
0
);
(E-38)
150

where () is the cumulative distribution function of a standard normal variable,
which has the general form (x) =
x
-
(u)du. By following equations (E-29) and
(E-30), the survival function and the probability density function (p.d.f.) f
l
(t) with
respect the failure time T in a log-normal model can be expressed as follows:
S
l
(t) = P r(T > t) = S
l0
(t exp(
- X));
= 1
- (
log[t exp(
- X)] -
0
0
);
1
- (
log[t]
- X -
0
0
);
(E-39)
and
f
l
(t) = f
l0
(t exp(
- X)) exp(- X);
=
exp[
-
1
2
(
log[t exp(- X)]-
0
0
)
2
]
t exp(
- X)(2)
1/2
0
exp(
- X);
=
exp[
-
1
2
(
log[t]- X-
0
0
)
2
]
t exp(
- X)(2)
1/2
0
exp(
- X);
=
exp[
-
1
2
(
log[t]-( X+
0
)
0
)
2
]
t(2)
1/2
0
.
(E-40)
Using =
0
+ X and =
0
, equations (E-39) and (E-40) can be re-written
as follows:
S
l
(t) = 1
- (
log[t]
-
),
(E-41)
and
f
l
(t) =
exp[
-
1
2
(
log[t]-
)
2
]
t(2)
1/2
.
(E-42)
Therefore, T = T
0
exp( X) follows a log-normal distribution with the param-
eters and . Explicitly, T = t
Logn(, ) where =
0
+ X and =
0
.
151
Ende der Leseprobe aus 154 Seiten

Details

Titel
Likelihood Method for Randomized Time-to-Event Studies with All-or-None Compliance
Untertitel
Causal Inference in Survival analysis
Hochschule
University of Canterbury  (Department of Mathematics and Statistics)
Veranstaltung
Statistics
Note
A
Autoren
Jahr
2009
Seiten
154
Katalognummer
V357300
ISBN (eBook)
9783668438620
ISBN (Buch)
9783668438637
Dateigröße
1013 KB
Sprache
Englisch
Schlagworte
Causal inference, Noncompliance, Survival analysis, Maximum likelihood, EM algorithm, Weibull distribution, HIP breast cancer trial, CACE, ITT analysis
Arbeit zitieren
Irene Hudson (Autor:in)Zhaojing Gong (Autor:in)Patrick Graham (Autor:in), 2009, Likelihood Method for Randomized Time-to-Event Studies with All-or-None Compliance, München, GRIN Verlag, https://www.grin.com/document/357300

Kommentare

  • Noch keine Kommentare.
Blick ins Buch
Titel: Likelihood Method for Randomized Time-to-Event Studies with All-or-None Compliance



Ihre Arbeit hochladen

Ihre Hausarbeit / Abschlussarbeit:

- Publikation als eBook und Buch
- Hohes Honorar auf die Verkäufe
- Für Sie komplett kostenlos – mit ISBN
- Es dauert nur 5 Minuten
- Jede Arbeit findet Leser

Kostenlos Autor werden