Statistics for Experimental Biologists

Home

Topic index

Key books

External links

Key books for biologists

The content of these books can be described as "what they didn't teach you in first year statistics-for-biologists, but what you need to know to do science well". The books are in no particular order, and clicking on the icons will take you to Amazon for further details. There are of course many other good books covering similar material, so just consider these a good place to start.

Math Level Key:

Little or no mathematics/statistics
♦♦Some parts require an introductory statistics course
♦♦♦Some parts contain more advanced mathematics, but much of the book is still relevant


Glass

Math level:

Experimental Design for Biologists (Glass DJ)
This book contains zero statistics. It is about approaching research problems with a question-and-answer framework rather than a hypothesis testing framework. It discusses how to validate an experimental system, various types of experimental controls, and how to build and validate a biological model. If you only read one book on this list, it should be this one.

Judd, McClelland, Ryan

Math level:

Data Analysis: A Model Comparison Approach (Judd CM, et al.)
This book is recommended for those who are looking for a good introduction to statistics. The book teaches statistics from a unified statistical modelling perspective, rather than the usual cookbook method of "this test goes with this type of data". T-tests, ANOVA, and regression are all examples of linear models, and the book shows how to model (i.e. analyse) data, rather than apply statistical tests to data. The only drawback is that examples are from the social sciences.

Cleveland

Math level:

Visualizing Data (Cleveland WS)
This hugely influential book is about how to understand your data by graphing it. A simple idea, but often not done well. The graphical methods are implemented in the lattice package for R. It also has a companion volume (see next book).

Cleveland

Math level:

The Elements of Graphing Data (Cleveland WS)
A second book by Cleveland (see previous entry), which focuses on basic principles for constructing good graphics. Required reading for anyone who makes graphs to present data. The graphical methods are implemented in the lattice package for R.

Borenstein, Hedges, Higgins, Rothstein

Math level:

Introduction to Meta-Analysis (Borenstein M et al.)
What happens when multiple studies are conducted to address a research question, and the results are p=0.12, p=0.032, p=0.002? Are these results conflicting? Is the effect real? Scientists usually evaluate such results with "vote counting" (one against vs. two for... but one p-value is just barely significant... hmm, maybe there's something there). This is not the way to proceed, but unfortunately this is how many scientists struggle to understand the results of multiple experiments. A meta-analysis allows one to numerically combine information across studies to get an overall picture of what is going on. The equations are also simple enough to do by hand.

Anderson

Math level:

Model Based Inference in the Life Sciences: A Primer on Evidence (Anderson DR)
This book is a good introduction to information-theoretic approaches. Examples are mostly from ecology, but it provides an alternative theoretical perspective on statistical inference, and the methods are easy to implement. The content is a subset of the next book, with most of the mathematics removed, and thus provides a more accessible book for biologists.

Burnham Anderson

Math level: ♦♦♦

Model Selection and Multi-Model Inference: A Practical Information-Theoretic Approach (Burnham KP, Anderson DR)
This book also provides an introduction to information-theoretic methods, but contains proofs and more advanced theoretical topics than the previous book.

Gelman, Hill

Math level: ♦♦

Data Analysis Using Regression and Multilevel/Hierarchical Models (Gelman A, Hill J)
This book has nothing to do with experimental biology, but it is remarkable in how smoothly it transitions from regression models (familiar ground for biologists) to hierarchical models (which should be used more often, given the hierarchical nature of many data sets) to Bayesian methods. This book is also a good introduction to statistical modelling in general, and examples are in R (and BUGS for the Bayesian examples).

Kruschke

Math level:

Doing Bayesian Data Analysis: A Tutorial with R and BUGS (Kruschke J)
Another great introduction to Bayesian methods using R and BUGS, but this time examples are from Psychology. Unlike the previous book, this one is purely Bayesian and starts at a more basic level. It has been receiving great reviews from statisticians and scientists alike.

Shipley

Math level: ♦♦

Cause and Correlation in Biology: A User's Guide to Path Analysis, Structural Equations and Causal Inference (Shipley B)
Laboratory-based biologists are fortunate because most factors are under experimental control and/or can be held constant. However, this is not always the case; for example, the interest is in manipulating X and observing Y, but X also affects Z, which is known to affect Y. Therefore, to what extent does X directly affect Y (if at all) and how much of X's effect is through Z? These types of questions can be addressed with structural equation models, and it is important to know how causality can be inferred from such data. Demonstrating cause-and-effect relationships is a core aspect of biological research, and therefore familiarity with these methods should be a part of every scientist's toolkit.

Berger, Wong

Math level: ♦♦♦

An Introduction to Optimal Designs for Social and Biomedical Research (Berger MPF, Wong WK)
Optimal design theory deals with getting the most information out of an experiment. Suppose you want to test the effect of a compound and are restricted to 20 animals. How do you decide on the number of groups, the actual dose levels, and the sample size in each group? Your choice will have profound implications for statistical power. In general, optimal designs allow you to do more with less, and to determine where further effort will be wasted.