## Assumed knowledge

The content of the modules:- Discrete probability distributions
- Binomial distribution
- Exponential and normal distributions
- Random sampling

## Motivation

- Why can we rely on random samples to provide information about the proportion of a population with a particular characteristic?
- Should we worry that different random samples taken from the same population will give different results?
- How variable are the results obtained from different random samples?
- How can we quantify the uncertainty (imprecision) in the results from a sample?

The module Random sampling introduces sampling from a binomial distribution. Underlying the binomial distribution are Bernoulli trials and Bernoulli random variables. A Bernoulli random variable is a discrete random variable that takes the value 1 with probability \(p\) and the value 0 with probability \(1-p\). If we have a random sample of \(n\) observations on a Bernoulli random variable, then the sum of the observations \(X\) has a binomial distribution with parameters \(n\) and \(p\). A random sample of \(n\) Bernoulli observations thus gives a single observation from the \(\mathrm{Bi}(n,p)\) distribution.

The binomial distribution allows us to model sampling from an essentially infinite population of units in which a proportion \(p\) of the units have a particular characteristic. If we choose a unit at random from this population, the probability that it has the characteristic is equal to \(p\), and the probability that it does not have the characteristic is equal to \(1-p\). If we choose \(n\) units at random, the number \(X\) with the characteristic has a binomial distribution: \(X \stackrel{\mathrm{d}}{=} \mathrm{Bi}(n,p)\).

In practice, we usually do not know the value of the population proportion \(p\), and we are interested in obtaining an estimate of \(p\). A single observation \(x\) of \(X\) can be used to provide a point estimate of the unknown population proportion \(p\): the sample proportion \(\frac{x}{n}\) is an estimate of the population proportion \(p\). There will be some imprecision associated with a single point estimate, and we would like to quantify this sensibly.

In this module, we discuss the distribution of observations from a binomial distribution to illustrate how it serves as a basis for using a sample proportion to estimate an unknown population proportion \(p\). By considering the approximate distribution of sample proportions, we can provide a quantification of the uncertainty in an estimate of the population proportion. This is a confidence interval for the unknown population proportion \(p\).

This provides methods for answering questions like:

- What is our best estimate of the proportion of Australians who plan to vote for Labor in the next federal election?
- What is the uncertainty in this estimate of the proportion of Australians who plan to vote for Labor in the next federal election?
- What is our best estimate of the proportion of physically inactive Australian adults?
- What is the uncertainty in this estimate of the proportion of physically inactive Australian adults?

Next page - Content - Using probability theory to make an inference