Theories about \(\mu\)
In the canned tuna example, instead of knowing or assuming that the value of the population mean was \(\mu = 95\) g, we could instead test the theory or hypothesis that \(\mu = 95\). All of the logic that we have seen so far applies to the method of testing a specific theory about \(\mu\).
"Weights and measures" is the area of government regulation that is concerned with fairness in the amounts of products sold, according to their labelled or advertised size or volume. In this context, one approach to testing whether the manufacturer of a particular product is violating the regulations, is to test the hypothesis that the population mean of their product is equal to the label value.
If cans of tuna have the label "95 g", we may test the hypothesis that \(\mu = 95\). This is a very specific hypothesis. We then ask: are the data consistent with this value of \(\mu\)? Or are they inconsistent with this hypothesised value? The way that these questions are answered is by the \(P\)-value.
When a particular hypothesis about a population parameter is tested, the hypothesis is known as the null hypothesis. There is a reason for this name. In this module we are looking at a simple context of testing hypotheses about a single population mean \(\mu\). We have suggested contexts in which this might be done. In research more generally, it is common to be testing hypothesis that arise from a comparison of populations. We might want to compare the average time that people survive on cancer treatments A and B, or the average score of students taught using methods A and B, or the average height of plants grown using treatments A and B.
If we compare two populations and we are especially interested in their means, \(\mu_1\) and \(\mu_2\), then a natural hypothesis test is that \(\mu_1 = \mu_2\), which is equivalent to the hypothesis \(\mu_1 - \mu_2 = 0\). This represents the pessimist's view of the world: that there is no difference between the populations means. And now we see where "null" comes from; nullus is the Latin word for 'none', or 'not any', and hence associated with zero. Here we are testing that the difference between the means is equal to zero. In other inference settingsthere are are different parameters but, again, we often test the hypothesis that represents an absence of effect, which is therefore aptly named the "null" hypothesis.
In contrast to the null hypothesis, we have the alternative hypothesis. Often, this is taken to be the denial of the null hypothesis. For example, if the null hypothesis is taken to be \(\mu =95\), the alternative hypothesis could be \(\mu \neq 95\). This is known as a "two-sided" alternative hypothesis, because it allows for either of the two logically possible alternatives to \(\mu = 95\), namely, \(\mu < 95\) and \(\mu > 95\). In this module we do not consider one-sided alternative hypotheses.
A null hypothesis is an assertion about a population or model, and a very specific assertion. In this module, we study the situation in which the null hypothesis is \(\mu = \mu_0\), where \(\mu_0\) is a specified numerical value. Since \(\mu\) is a parameter, we will always be uncertain about its value, and therefore unable to conclude, for sure, whether or not the null hypothesis istrue. But a random sample from the relevant population does shed some light on the hypothesis, and we can sensibly ask how consistent the sample is, with the null hypothesis. The answer to this question is provided by the \(P\)-value.
The \(P\)-value is a probability. It is used for testing a hypothesis, often about an unknown population parameter. In this module, we are only considering the context of inference and hypothesis testing about the population mean \(\mu\), but there is a general definition for the \(P\)-value that applies to the testing of any null hypothesis.
This definition is as follows:
The \(P\)-value for testing a null hypothesis is defined to be the probability of a result at least as extreme as that observed, given that the null hypothesis is true.
There are several important aspects to this definition.
- The probability is conditional on the null hypothesis. It assumes that the null hypothesis is true.
- (Therefore) the \(P\)-value cannot be construed as the probability that the null hypothesis is true.
- The \(P\)-value depends on the data (the "result"). To calculate it, we need a test statistic: a function of the data whose distribution is known, at least approximately, when the null hypothesis is true, and has a markedly different distribution when the null hypothesis is not true.
- The word "extreme" is important in the definition, since the way we calculate the \(P\)-value in a particular case is determined by what constitutes more "extreme" than that observed. The alternative hypothesis plays a key role in determining the correct interpretation of "at least as extreme".
- Small \(P\)-values are evidence against the null hypothesis, and the smaller the \(P\)-value, the stronger the evidence.
- On the other hand, large \(P\)-values indicate that we have data that are consistent with the null hypothesis. That does not mean that large \(P\)-values provide strong evidence that the null hypothesis is true. We explore this point further, later.