## Content

### Using probability theory to make an inference

The module Binomial distribution introduces the concept of a binomial random variable. Recall that, if a random variable \(X\) has a binomial distribution with parameters \(n\) and \(p\), we write \(X \stackrel{\mathrm{d}}{=} \mathrm{Bi}(n,p)\). A binomial random variable can always be thought of as the number of successes in \(n\) independent Bernoulli trials, each with probability of success \(p\). For this reason, the binomial distribution is often assumed to be an appropriate model when we count the number of units with a characteristic of interest in a random sample of size \(n\), taken from a population in which the proportion of units with the characteristic is \(p\).

In the module Binomial distribution, the value of \(p\) is generally assumed to be known. However, in many realistic and relevant research contexts, the value of \(p\) is not known, but we are very interested in its value, because it relates to a research question of some importance.

We have often appealed to an argument based on symmetry and appropriate random mixing (such as shaking a die in a cup) to justify particular numerical choices for probabilities: for example, that the chance of rolling a four using a fair die is \(\frac{1}{6}\).

But knowing probabilities, or even having a basis for assuming particular values, is not a common scenario. The opposite is the case. We are often confronted with a situation where we believe that a binomial model is appropriate and we know the size \(n\) of the random sample of units, but we do not know \(p\). And we would like to know it.

One of the main reasons for studying probability distributions, such as the binomial, is that this theory is the foundation for making inferences about unknown population characteristics, such as \(p\). In general terms, this is known as **statistical inference**.

Here are two quite different contexts with the same underlying binomial structure, where it is clear that \(p\) is unknown:

- A random sample of voters is asked about their current political preference. We are interested in using the sample to draw an inference about the proportion of the population of voters who currently prefer Labor.
- Consider a standard drawing pin with a circular, slightly rounded head. If this drawing pin is tossed (in a similar way to a normal coin toss) and allowed to land on a flat surface, there are two ways it can finish:

- leaning on the point of the pin (as shown on the left in figure 1)
- lying flat with the pin pointing straight up (as shown on the right).

What is the chance that it finishes with the pin pointing straight up?

Figure 1: The two ways a drawing pin can finish after being tossed like a coin.

It is part of the power of probability models, and their application in statistics, that such diverse problems as these two can be dealt with in the same way.

In this module, we use the binomial structure to think about the following specific inferential problem: If we have an observation from a binomial random variable with known \(n\) but unknown \(p\), how can we make an inference about \(p\)?

Next page - Content - The sample proportion as an estimator of \(p\)