## Errors in hypothesis testing

### $P$-values do not measure the importance of a result

A small $P$-value suggests we have found a surprising result, given we had assumed that the null hypothesis was true. Sometimes a result that is surprising (or statistically improbable) is interpreted as being important, and the $P$-value is interpreted as quantifying the importance of a result. Typically, small $P$-values are taken to reflect important findings and large $P$-values are not. This is (very) wrong. The importance of a statistical finding depends on many things. One obvious consideration is the point estimate. In our chocolate bar example, the estimated mean weight was 49.5 g. We need to ask the question: would it matter if the bars were underweight by an average of 0.5 g? This is an average of 1% underweight — such a mean difference might be not be important to the manufacturer but could be important to the consumer! Another important consideration is the confidence interval: 48.7 g to 50.3 g. The confidence interval quantifies the uncertainty in the estimate of the true population mean.

Exercise 4

A researcher wished to investigate whether children's habits were consistent with the Australian Government's Department of Health recommendation to "limit use of electronic media for entertainment to no more than two hours a day". She asked parents of sixty 11-year old children, randomly chosen, to record the amount of time per day that their child spends using electronic media. She wanted to know: is there evidence that the average amount of electronic media use for 11-year olds is consistent with the upper limit of two hours per day?

The sample mean $\bar{x}$ is 3.1 hours per day, and the standard deviation $s$ is 2.5 hours per day. The researcher carries out a test of the null hypothesis that true mean is 2 hours per day: $\mu = 2$. The $P$-value is 0.001; the confidence interval is 2.5 hours to 3.7 hours.

Based on this, which of the following are reasonable claims for the researcher to make about 11-year old children?

1. The probability that the average electronic media use of 11-year olds meets the Government recommendations is 0.001.
2. The probability that the study is wrong is very small.
3. The average electronic media use of 11-year olds is estimated to be more than one hour above the Government recommendation.
4. The result is not consistent with the Government recommendation, as $P = 0.001$.
5. The confidence interval suggests that average electronic media use of 11-year olds could be between 0.5 and 1.7 hours above the recommendation.

Exercise 5

Consider the study above, but now the researcher asks parents of forty-five 6-year old children, randomly sampled. The sample mean $\bar{x}$ is 2.5 hours per day, and the standard deviation is 2.5 hours per day. The $P$-value is 0.18 (testing $\mu = 2$). The confidence interval is 1.8 hours to 3.2 hours.

Based on this, which of the following are reasonable claims for the researcher to make about 6-year old children?

1. The large $P$-value shows that average electronic media use of 6-year olds meets the Government recommendations.
2. The average electronic media use of 6-year olds is estimated to be half an hour above the Government recommendation.
3. The result is consistent with the Government recommendation, as $P = 0.18$.
4. The confidence interval suggests that average electronic media use of 6-year olds could be as much as 1.2 hours above the recommendation.
5. The Government need not be concerned about the electronic media use of 6-year olds as the $P$-value is 0.18.

Next page - Answers to exercise