||Is the information below useful? Chapter 2 covers order effects (where they might occur and how to deal with them) in more detail.
Randomisation II: order of sample collection and processing
The previous article on randomisation discussed the need to ensure that the arrangement of samples is not confounded with spatial artifacts. Here, another situation where technical artifacts can lead to erroneous conclusions is discussed: when the values obtained are related to the order in which samples are collected or measurements taken. If this order is confounded with treatment effects, then this experiment can be added to the rubbish bin.
A striking example of order effects (from two independent labs) is discussed in Bland (2005) in the context of a homeopathy experiment. Since appropriate randomisation was used, it was possible to (1) detect the presence of such effects, and (2) adjust for them, demonstrating that there was no effect of the homeopathic treatment (which is not surprising, since the homeopathic practice of diluting a compound until none remains contradicts everything that is known about pharmacology. It's kind of like buying a bottle of aspirin, tossing out the pills, filling the bottle with water, and then drinking it. The therapeutic magic was in the aspirin tablets, not the bottle they came in!) The occasional positive finding in homeopathy studies is often attributed to Type I errors (if you do enough studies a few will be positive due to chance), but technical incompetence is also a likely explanation (see Maddox et al., (1988) for a remarkable account of what happened when a team from Nature visited a lab).
One obvious example where the order of sample collection is important is when there is a circadian rhythm in outcome variable. For example, if you are interested in estimating plasma levels of corticosterone (a stress hormone with a known circadian rhythm) in rats, and take blood samples from all of the control animals first, followed by all of the treated animals, there may be differences in corticosterone levels between groups which are due simply to a time-of-day effect. Alternatively, true differences between groups could also be reduced and thus missed (depending when in the cycle the samples are taken). The magnitude of the bias will partly depend on the time of day and the length of time it takes to collect all of the samples. A better alternative would be to alternate between control and treated animals. A remarkable aspect of this type of bias is that it is reproducible (Rosenbaum, 2001). If the entire experiment was repeated (to really make sure these exciting results are real) and the sample collection starts at the same time of day, with the controls first followed by the treated animals, the same circadian effect would be picked up and misinterpreted as a treatment effect. Here, an "independent replication" of a bad sampling design offers no protection against nonsensical results.
There is also another seemingly innocent situation where order effects can be important: when a procedure involves multiple steps, and the time to carry out the steps varies. This is illustrated in the figure below, where rats are injected with a substance that causes inflammation, and this procedure takes one hour to complete (light blue bar). After the last injection, one hour is allowed to elapse for the substance to have an effect, and then sample collection begins (at t=2) and takes four hours to complete (red bar). It is typical that the order of injection is the same as the order of sample collection (assume that a blood sample is being collected, and a marker of inflammation is the main outcome variable). The problem of course is that the length of time from injection to collection for the first animal is two hours, while for the last animal it is five hours (arrows), which is simply due to the fact that blood collection takes longer than the injections. The animals in between will have a gradient of time lags between injection and collection, and if the inflammation response is not constant (e.g. there is an increase to a maximal response which eventually subsides), then the measured value of the inflammation marker will be related to the order of sample collection. A problem then arises if the controls were injected first and the treated animals second; it will be difficult to separate the treatment effects from the order effects.
Order effects can appear anywhere, they may be due to a one-off event, or stable across replicate experiments. The good news is that the influence of order effects can be taken into account if the appropriate design has been used. The general principle is: randomise, block, or cycle through the experimental conditions when collecting and processing samples, even if you don't think any order effects are present. This will not eliminate these technical effects, but it will allow them to be detected and to be taken into account, ultimately still providing valid inferences. There is no statistical fix for when order effects are completely confounded with treatment effects, it's kind of like having an all-female control group and an all-male treated group, there is just no way to separate the effect of sex from the effect of the treatment.
Bland M (2005). The Horizon homeopathic dilution experiment. Significance 2(3):106–109. [Link]
Maddox J, Randi J, Steward WW (1988). "High-dilution" experiments a delusion. Nature 334:287–290. [Pubmed]
Rosenbaum PR. (2001). Replicating effects and biases. The American Statistician 55(3):223–227. [Link]