Monday, May 23, 2011

Means and p values.

On comparing two groups’ means (or averages), it’s not sufficient to only compare the means –Because an average is just one statistic that summarises a whole distribution of scores.

In the picture below, the mean age at which these kids first drank alcohol, was around age 14. But there are kids who started earlier, and some who started later.

When comparing two means, it is important to determine whether the two distributions differ so much, that it is unlikely that they are both from the same bigger population.

If they differ, the null hypothesis is rejected. If they don’t differ, the alternative hypothesis is rejected.

Notice: although the means differ in B, the overlap in distributions is quite large.

Depending on the scale of the data (nominal, ordinal, interval or ratio) the properties of the distributions (normally distributed or not) and the kind of comparison that’s required (i.e. two independent groups e.g. boys and girls; or two measures for the same group e.g. average for boys before the programme, and after the programme) different statistics may be used.

Usually, we do a t test which yields a t statistic or an ANOVA which yields an F statistic, or their non-paramatric equivalents – the Mann Whitney or Kruskall Wallis test. Because it isn’t very easy to off- hand know if a t of 112 is good or bad, these statistics are converted to a p value (probability value) which indicates how probable it is that the null hypothesis is true.

If the p value is smaller than <0.05, the null hypothesis is rejected – there is only a 5% chance that the two distributions are the same.

Just look carefully at that criterion: p values of 0.5 (50%) and 0.06 (6%) are bigger than 0.05, the null hypothesis will be accepted. A p value 0.045 (4.5%), or any value such as p < 0.000, is smaller than 0.05 and would therefore mean the null hypothesis should be rejected - in other words, the two means differ statistically significantly.

A cut off of p = 0.05 is conventional, but a p of 0.1 (10%) or 0.001 (1%) is sometimes used as a cut-off criterion (depending on the likelihood of Type I and Type II errors)

A result like the one below means:
t (163) = -2.68, p < .05
The t statistic for the means calculated from two groups with 163 cases is -2.68, and is statistically significant at the 5% level.

F (2, 1015) = 111.286, < .001
The F statistic, for a sample of 1015 cases with 2 degrees of freedom (i.e. three groups) is 111.286 and is statistically significant at the 1% level.

The smaller the p value is, the happier you should be – because it means that you will have something interesting to report on!

1 comment:

Benita Williams said...

Ooh - some credit is due. The pictures come from a "Quantitative Methods" course that Katherine McKnight presented at the AEA 2010 conference in San Antonio. She's a good lecturer and the course is useful if you want to learn, or if you want to brush up!