Errors in hypothesis testing

Large $P$-values do not prove the null hypothesis

50 g chocolate bars

A manufacturer wishes to claim that the average weight of chocolate bars is 50 grams. In order to make a check, a random sample of 40 bars is taken and weighed; the mean is 49.5 g, and the standard deviation is 2.5 g. A test of the null hypothesis that $\mu$, the true mean weight, is 50 g is carried out. The $P$-value is 0.21. How can we interpret this $P$-value?

Is the manufacturer justified in claiming that the large $P$-value proves that the true mean weight is 50 g? Clearly not, but large $P$-values are sometimes interpreted as providing 'proof' that the null hypothesis is correct or true. The data observed are consistent with the null hypothesis, $\mu = 50 g$, but the data are also consistent with other values of the population parameter. If, for example, the manufacturer had tested the null hypothesis that the true mean weight was 49.8, the $P$-value is 0.45. If the null hypothesis was that the true mean weight was 50.1 g, the $P$-value is 0.13. These are also relatively large $P$-values, and it would be illogical to claim that there was 'proof' that these hypotheses were also true.

Again, there is value of interpreting the $P$-value with the confidence interval; the 95% confidence interval for the true mean weight was 48.7 g to 50.3 g. The null hypothesis of interest, $\mu = 50$ g is just one of the plausible values for the true mean that is included in the confidence interval.

Next page - Errors in hypothesis testing - $P$-values do not measure the importance of a result