What is a confidence interval?

Published by

on

I think confidence intervals might be the most commonly-misunderstood statistic. I’ve been trying to understand them better and this is what I’ve learnt. (In this blog I explained p-values and statistical significance and recommend reading that first.)

How do we create confidence intervals?

We typically use p-values to test the null hypothesis – that there was no effect. But you can test other hypotheses too. You could test the hypothesis that an effect size is 0.1 or 0.2. Each time you test a hypothesis you are asking: ‘assuming this hypothesis was true, what is the probability I’d see an effect at least as different to it as the one I calculated in my study?’

If you set your test hypothesis as the value of your observed effect, you will calculate a p-value of p=1. Given that your test hypothesis and observed effect are the same, all possible values in both directions are at least as different from the test hypothesis as your observed effect.

Say your observed effect size was 0.2. If you used 0.2 as your test hypothesis then the p-value would be 1. You’d also have a range of possible effect sizes around 0.2 where, if you used them as your test hypothesis, you would calculate a p-value of p > 0.05. Let’s say these effects range from 0.15 to 0.25. Any effects outside this range will have p < 0.05 when tested. Assuming the true effect was outside this range, you would be unlikely (<5% probability) to see an effect as large as your observed effect by chance. Therefore effects outside of this range do not seem to be compatible with your data. Effects inside the range seem more compatible with your data. This range is your 95% confidence interval.

Many sceptics of p-values argue that p-values should be ditched in favour of confidence intervals. Sometimes these sceptics ignore the fact that the two are closely linked. The confidence interval shows the range of values for which, if you did a hypothesis test, the p-value would be > 0.05.

What does the 95% confidence interval mean?

What’s important to remember about the 95% confidence interval is that the ‘95%’ label does not mean you are 95% confident about anything regarding that single interval. It means something else: if you calculate many 95% confidence intervals and the assumptions used to calculate them hold, then 95% of them will contain the true effect size.

I think the easiest way of interpreting a confidence interval is it gives a range of effect sizes that are fairly compatible with your data. If the interval is wide then you have an imprecise estimate. If the interval is narrow then you have a more precise estimate.

Some misconceptions

It’s worth running through some common misconceptions and dealing with them explicitly.

Misconception: A particular 95% confidence interval has a 95% chance of including the true effect size. How could this be true? The true effect size is either within your single confidence interval or not. You don’t know the true effect size – that’s why you’re trying to estimate it. So how could you come up with this probability? Instead, the 95% refers to the idea that if you calculate many valid confidence intervals then 95% of them will contain the true effect size.

Misconception: If the confidence interval includes 0, the true effect size is 0. This makes exactly the same mistake as treating p = 0.05 as a cut off. The confidence interval can help you think about what effects are compatible with your data. It can’t confirm particular effects as ‘true’.

As ever, grateful for thoughts.

Leave a comment

Design a site like this with WordPress.com
Get started