I’ve been working with research for a long time now but have always found it hard to understand p-values, confidence intervals and statistical significance. I thought that writing a clear explanation in my own words could help. I’m posting my attempt here for my own benefit – hopefully it leads to useful comments that improve my understanding.
I’ll focus on trying to understand this in the context of a randomised control trial, as this is where I usually encounter it. If you are unsure what a randomised control trial is, this is a good introduction.
I’ll write this out over a series of posts. This first post will describe the fundamental problem that p-values are designed to address: uncertainty.
To understand p-values and the like, we need to first adopt a particular way of thinking about uncertainty. According to this way of thinking, we do RCTs because we want to know the impact of an intervention on an outcome for a population. For example, this could be the impact of a reading programme on reading ability of all children in primary schools. But to estimate the impact on our population we need to do several things which introduce uncertainty and make it impossible to get a perfect estimate.
Usually, it’s not practically possible to do research on a whole population. We can’t do an RCT on all children in all primary schools so we need to select a smaller number of children to work with. This smaller number of children is called a sample. There is always going to be a possibility that the sample we take is different in some important way to our population. Perhaps the children in the sample respond unusually well to the intervention compared to the population as a whole. This introduces uncertainty about whether our RCT using this sample provides a good estimate of the effect of the intervention for our population. This type of uncertainty is called sampling uncertainty.
Running an RCT requires you to randomly assign some participants to an intervention group and others to a control group. We use random assignment to try to ensure that the only difference between the control and intervention groups is whether or not they receive the intervention. However, we can’t guarantee that this has been successful. There is still the risk that intervention and control groups are different in an important way. Perhaps the control group is full of participants who, for some reason unrelated to the intervention, will make smaller gains in the outcome. For example, perhaps the control group in our example RCT includes a greater number of children who have reading difficulties. This introduces the chance that any effect in the RCT is not caused by the intervention but is instead related to other differences between the two groups. This type of uncertainty is called allocation uncertainty.
A final type of uncertainty is introduced because, to make a comparison between the intervention and control groups, we need to measure outcomes for both groups. In our example RCT we’ll use a reading test to try an understand children’s reading ability. This reading test can’t give us a perfect picture of a child’s reading ability as reading ability is just too complex a concept to capture perfectly in one-hour reading test. Every measurement we take is going to differ in some way from the ‘true’ picture of the child’s reading ability. This type of uncertainty is called measurement uncertainty.
So this is the problem that p-values and associated statistics are trying to help us to solve. Every time we estimate the effect of an intervention in an RCT, we will face uncertainty about the effect. These statistics aim to give a quantitative sense of the uncertainty around that estimate of effect. More on how they do this in a future post.
Leave a comment