Univariate and Multivariate Methods for the Analysis of Repeated Measures Data

Thesis (M.A.) 1999 64 Pages

Mathematics - Statistics


Table of Contents

1. Introduction

2. Exploring The Data

3. Time by Time ANOVA

4. Univariate Approach to Repeated Measures
4.1 Repeated Measures ANOVA
4.2 Testing For Compound Symmetry

5. Multivariate Approach to Repeated Measures
5.1 Profile Analysis
5.2 Assumptions of Profile Analysis
5.3 Testing For Multivariate Normality
5.4 Testing For The Equality of Covariance Matrices
5.5 Hypothesis Tests for Profile Analysis

6. The Generalised Multivariate Analysis of Variance
6.1 Growth Curves
6.2 Hypothesis Tests For Growth Curves
6.3 Testing Polynomial Adequacy
6.4 Testing For The Equality of Growth Curves

7. Conclusion


Appendix A

1. Introduction

Milk from two groups of lactating cows, one group vaccinated, the other not, was analysed every month after calving for eight months in order to measure the amount of bacteria in the milk. The primary goal of the experiment was to determine if a vaccine developed by the Royal Melbourne Institute of Technology’s Biology Department led to a significant decrease in mean bacteria production compared to the control group.

Experiments such as this fit into the family of designs known in the literature as repeated measures data, longitudinal models, or growth curves. Data from these models generally arise whenever more than two observations of the same variable are made on an individual subject or experimental unit. These models are especially common in biology, agriculture, and medicine and most often occur when observations on a group of subjects are repeated over a period of time.

Repeated measures data such as this require somewhat different statistical treatment than normal because the observations are not independent. This lack of independence lies at the core of repeated measures analysis, and is what differentiates it from the more commonly used statistical analyses. The implications of a lack of independence within a subject’s responses are serious, as will be explained in Chapter 4. For the data collected by the RMIT biology department, this implies that the amount of bacteria found in a litre of a cow’s milk will, at any given time, be correlated with the amount of bacteria found in that cow’s milk at subsequent or preceding times. In addition, the correlation between the amount of bacteria produced at different times also tends to be stronger the shorter the time interval. In other words, the amount of bacteria produced per litre of milk is more dependent on the amount of bacteria produced one month ago than the amount of bacteria produced five months ago.

Correlation between observations is usually present in these types of experiments. Nearby plots in a field trial are usually more similar than plots further apart. When applying different levels of a factor, the effects of this correlation are generally overcome by randomisation – the levels are randomly allocated to the experimental subjects. Randomisation ensures that in the long run there is no correlation between the factor levels, so that observations with any given factor level are not more similar to some factor levels than to others. Since time is treated as a factor with the eight months considered the eight levels of time, randomisation is impossible – the observations must follow their natural sequence. Thus, it is not possible to randomise the order of monthly observations: they must follow the sequence month 2, month 3, month 4… As a result of the lack of randomisation, the means of two milk samples taken a month apart, for example, tend to be more highly correlated than those taken 6 months apart. As a consequence, the precision of the difference tends to drop as the time interval increases, nullifying the use of a single standard error of difference for the time factor.

This variation in correlation between levels of the time factor means that it is inappropriate to analyse the data as if time was a randomised factor. This thesis covers some of the most commonly used techniques. Frequently, there is no single best approach to analysis. It depends on what questions need to be answered. Often, it is useful to use two or more approaches with the same data.

The analysis carried out on a set of repeated measures data is determined largely by the questions the researcher wants answered. For this study, the most important question is that concerning the vaccine: is there a significant difference between the mean number of cells found in a litre of milk produced by cows in the treatment group compared to the control group, over time? Secondary questions may include such things as the change in cell production over the months irrespective of group membership. The first question when studying repeated measures, or in fact any, data, should not be how to analyse the data, but what is the experimenter is interested in finding out (Lindsey, 1993). Once this is known, together with knowledge of the techniques available, the selection of an appropriate technique becomes much easier.

2. Exploring The Data

The biology department of RMIT has developed a vaccine which is thought to reduce the number of cells of mastitis, hereto known as ‘cells’ found in a cow’s milk. The vaccine was tested on a randomly chosen sample of 23 cows, while a randomly chosen sample of 18 cows was used as the control. Readings of the cell count of each cow’s milk were taken at 2, 3, 4, 5, 6, 7, 8, and 9 months after calving.

One of the attractive features of repeated measures data is that they can be displayed in a graphical plot which is readily interpretable, without requiring a great effort and little training is required to interpret the plots (Lindsey, 1993). Data plotting is essential in order to get some feeling for what patterns are present in the data, whether expected trends have occurred, what unexpected features are apparent and what questions deserve analytical consideration. It should always precede detailed analysis of the data. Figure 1 (a)-(d) shows various plots of the two groups.

illustration not visible in this excerpt

The skew towards higher values of cells can be seen over most of the time periods, although some months are worse than others. The boxplots, where the boxes contain 50% of the data, tend to vary over time in the control group. The pattern of variation is roughly similar for the treatment group. The treatment group appears to have comparatively smaller variances over time than the control group. If it were not for the larger variance at month 3, the variance of the control group would be increasing with time. The variance of the treatment group looks as if it decreases slightly over time. Plots of the individual cow’s results show that the control group cows cell production increases linearly over time, although the trend is not patent. The treatment group does not show any apparent increase over time. There appears to be little evidence of a quadratic or cubic growth curve from the plots.

The outliers were noted and checked for accuracy with the RMIT Biology department to make sure there were no transcription errors or similar problems. The outliers were all legitimate observations.

The sample means of each group are plotted in Figure 2(a). Plotting the medians, Figure 2(b) as well as the means allows one to look at the data from a slightly different perspective, one that is resistant to outliers. Since the observations for any given month are generally skewed, the medians are a useful adjunct. The control group’s mean response is not as stable as the treatment group’s.

Figure 2.

illustration not visible in this excerpt

Fitting ordinary linear regression equations to the data gives an indication of the linear trend for each group. Figure 3 (a) shows the control group has a positive slope which highlights the increase in cells over time, while Figure 3(b) shows the treatment group’s slope is negative and flatter. At first it might appear that a possible model for these observations is the general linear model. The problem is, like ANOVA, that the assumptions of linear regression require independence of the variables. In chapter 6 growth curves will be fitted to the data using a multivariate approach which does not have the restrictive assumption of independence.

Figure 3.

illustration not visible in this excerpt

Although plotting the data is imperative with any analysis, it does not allow the analyst to make anything more than general statements summarizing the apparent behaviour of the subjects being studied. What is really needed is to be able to quantify the responses and formally test the research questions given in chapter 1 as hypotheses. Chapter 3 describes the most simple method for doing this.

3. Time by Time ANOVA

One of the simplest forms of longitudinal analysis is a time-by-time ANOVA. It consists of p separate analyses, on for each subset of data corresponding to each time of observation t. For more than two groups each analysis is a conventional ANOVA, however since there are only two groups being compared in this study, the ANOVA reduces to a two-sample t -test of H 0: m control = m treatment at each of the p = 8 times of measurement. Table 1 shows the time-by-time ANOVA results for the data

Table 1.

illustration not visible in this excerpt

The time-by-time analysis indicates that mean cells count does not differ between the control and treatment groups in any of the 8 months. This suggests that the two mean response profiles are alike. Month 5 has a large t test statistic (t = -1.56), but not enough to be significant.

A time-by-time ANOVA is reasonably clear and uncomplicated, however Diggle, Liang and Zeger (1994) point out its two major limitations. Firstly, it cannot address questions concerning treatment effects which relate to the longitudinal growth of the mean response curves, i.e. the growth rates between successive months. Secondly, the inferences made within each of the p separate tests are not independent of each other, nor is it clear how they should be combined. For example, a succession of marginally significant group mean differences may be compelling with weakly correlated data, but much less so if there are strong correlations between successive observations on each cow.

The principal virtue of the time-by-time ANOVA approach to longitudinal studies is its simplicity. The computational operations are elementary and the approach uses familiar procedures in finding a solution to the problem.

In summary, whilst the time-by-time ANOVA may be useful in particular circumstances, Diggle, Liang and Zeger (1994), do not recommend it as a viable approach to longitudinal data analysis.

4. Univariate Approach to Repeated Measures

A more sophisticated approach than the time-by-time t -tests is a repeated measures analysis of variance (ANOVA). In contrast to the time-by-time approach, Diggle, Liang and Zeger (1994), regard it as a first attempt to provide a single analysis of a complete longitudinal data set.

4.1 Repeated Measures ANOVA

Experiments utilising repeated measures designs differ from the usual ANOVA models in that the levels of time cannot be randomly assigned to one or more the experimental units in the experiment. In this case the levels of time cannot be assigned at random to the time intervals, and thus the usual ANOVA models may not be valid. Because of the non-random assignment of time, the errors corresponding to the respective experimental units may have a covariance matrix which does not conform to those for which the usual ANOVA analysis are valid.

The inherent dependence that is associated with repeated measures data introduces extra complications into the analysis. Unfortunately, the simplifying properties arising from data which are independently and identically distributed can no longer be relied upon. To yield conclusions which are valid the analyst must take into account the possible dependence within subjects. Fortunately, Diggle, Liang and Zeger (1994) and Vonesh and Chinchilli (1997) have outlined methods which modify the problem so that independence based methods like ANOVA can be used.

Superficially, the N cows by p months structure of the data resembles that of a randomised block or split plot design, so there is a temptation to carry out a standard two factor group ´ month ANOVA. Using the standard ANOVA approach to this problem presents problems for the unwary. Employing a standard ANOVA model would regard the control and treatment groups as a factor on two levels, and more importantly, it would regard time as a factor on p levels. One of the difficulties with this approach is that the allocation of times to the p observations within each cow cannot be randomised.

As was mentioned in the introduction, randomisation of the various levels of a factor is an essential requirement of ANOVA. Usually treatment factors are randomised within blocks. This is assumed in ANOVA. For example, the various treatments in Block 1 can be randomised.


illustration not visible in this excerpt

The times, if they are regarded as factor levels, cannot be randomised as above, an essential requirement of ANOVA. They must follow their natural sequence. The first measurement is the first measurement, it cannot be taken third. In general, these extra complications mean that the simple univariate ANOVA F -tests will no longer be valid.

An approach that takes advantage of the fact that the p = 8 measurements on each cow are repeated observations on the same response variable, namely, cells, is given in Vonesh and Chinchilli (1997). The additive model is

illustration not visible in this excerpt


illustration not visible in this excerpt

In the general model for this experiment, 18 cows were randomly assigned to the control group and 23 cows were randomly assigned to the treatment group, and each cow’s milk was tested on eight occasions (months).

Geisser (1980) gives the form of the ANOVA table

Anova Table

illustration not visible in this excerpt

Note that in this model there are two error terms, where[illustration not visible in this excerpt]represents the random error due to cow i within group j. In addition, there is the assumption that the [illustration not visible in this excerpt]’s and the [illustration not visible in this excerpt]’s are independent with

illustration not visible in this excerpt

For this approach to be appropriate, Milliken and Johnson (1984) claim that the variance structure of S = Cov([illustration not visible in this excerpt]) must satisfy the assumption of compound symmetry. A covariance matrix S is of compound symmetry form if it can be expressed as

illustration not visible in this excerpt

The compound symmetry condition implies that the random variables are equally correlated and have equal variances. In other words, the variances of the differences between pairs of errors, such as [illustration not visible in this excerpt] are equal for all [illustration not visible in this excerpt]. The variance structure is also called the uniform-variance, equi-variance or the equi-correlation structure.

Assuming that the univariate repeated measures approach is appropriate, i.e. S has the compound symmetry structure described above, the F tests for the hypothesis of parallelism (no group ´ month interaction), coincidence (no differences between groups), and constancy (no differences among months) are more powerful than the corresponding multivariate tests where no structure is assumed for S (Vonesh & Chinchilli, 1997).

Tests exist for compound symmetry. Firstly, the above model will be applied to the raw data and an analysis of the residuals for the usual assumptions done before testing the variance structure of the residuals for compound symmetry.

The results from carrying out the analysis on Minitab are

illustration not visible in this excerpt

The ANOVA table shows a significant effect for the group ´ month interaction, as well as month alone. Before any conclusions can be made, an analysis of the residuals needs to be made. Figure 4 (a) shows the residuals plotted against the fitted values from the above model. There is evidence of increasing variance as the predicted values increase. There also appear to be many outliers with larger than expected positive values. Also alarming is the obvious lack of normality of the residuals in Figure 4(b).



ISBN (eBook)
ISBN (Book)
File size
668 KB
Catalog Number
Univariate Multivariate Methods Analysis Repeated Measures Data MAppSc




Title: Univariate and Multivariate Methods for the Analysis of Repeated Measures Data