||Is the information below useful? Chapter 6 covers exploratory data analysis in more detail.
Graphing repeated measures data
The first principle is that you must not fool yourself—and you are the easiest person to fool.
–Richard P. Feynman
Graphs serve two purposes, first to understand the data, and second, to communicate results to others. Not all graphs meet these objectives equally well. One of the most common displays is the "mean and error bar graph", where the mean is represented by the top of a bar or a point, and the error bars usually represent one standard error of the mean (SEM). These graphs are so popular because of what they hide, not what they reveal; outliers, unequal variances, clusters of points, non-normal distributions, dropped data points, etc. can all be concealed behind the mean and SEM. It is one thing to present such data to others, it is an entirely different thing to not understand what actually happened in the experiment.
The data below are from a real experiment. Animals were injected with a compound (at time = 0), while the control animals received a placebo. Locomotor activity in an open field was recorded, and the amount of activity in each 15 minute interval are presented (actual values were divided by 1000). The first figure is the typical mean and SEM plot, and this is usually the only figure produced before an analysis is conducted. If you were to verbally describe the effect of the compound, you might say that the two groups are similar 15 minutes after the injection, and then the compound increases activity between 15 and 30 minutes, where it reaches its maximum effect, and then remains stable until the last observation. The controls remain relatively flat, and the difference between the groups looks fairly convincing... just need some p-values before writing it up for a publication.
A remarkable aspect of this data set (and many others) that is completely obscured by this type of graph is that for treated group, no single animal follows the trajectory of the population mean. This can be seen in the second figure, which plots the profile for each animal across time, and where it is clear that the compound had no effect on some animals, had a rapid effect on others, and a slower acting effect on the rest.
To better see these differences, the animals in the drug group were clustered (k-means clustering, with three groups), and are plotted in separate panels in the third figure. Animals in cluster one look similar to the control animals (compare with the previous figure), animals in cluster two have a rapid onset but then drop back down to low activity levels by 60 minutes, and animals in cluster three have a gradual increase in activity that plateaus at around 60 minutes. It can be seen how sub-groups of animals with different profiles can give the misleading average profile in the first figure. It is also clear why the error bars are bigger in the treated group; it is not just because the data are "more variable", it is because the compound is having different effects on different animals.
There are two things that can account for these results. First, there is some very interesting biology going on here. Despite the animals being genetically similar and housed under identical conditions, they show dramatically different responses. This is the stuff that scientific discoveries are made of; such anomalous results lead to further hypotheses, for example about the compound's mechanism of action. These unplanned aspects of the experiment can often be the most informative. The second (and perhaps more likely explanation) is that there are uncontrolled technical issues which affected the results, obvious factors should be checked, such as do the clusters correspond to housing cages, the animals' sex, the order the observations were obtained, were different open field boxes used, etc. The compound was injected intraperitoneally, it is possible that some of the injections were off-target, with some ending up in the intestine (non-responders), some intravascularly (fast responders), and some on target (slower responders).
It is clear that plotting summary statistics (i.e. mean and SEM) misses important aspects of the data, and which would not have been picked up by the typical ANOVA analysis. The raw data should always be plotted to understand what is actually going on, and as a quality control check (any outliers, etc.?)