## Assumed knowledge

The content of the modules:

## Motivation

• Why can we rely on random samples to provide information about the proportion of a population with a particular characteristic?
• Should we worry that different random samples taken from the same population will give different results?
• How variable are the results obtained from different random samples?
• How can we quantify the uncertainty (imprecision) in the results from a sample?

The module Random sampling introduces sampling from a binomial distribution. Underlying the binomial distribution are Bernoulli trials and Bernoulli random variables. A Bernoulli random variable is a discrete random variable that takes the value 1 with probability $$p$$ and the value 0 with probability $$1-p$$. If we have a random sample of $$n$$ observations on a Bernoulli random variable, then the sum of the observations $$X$$ has a binomial distribution with parameters $$n$$ and $$p$$. A random sample of $$n$$ Bernoulli observations thus gives a single observation from the $$\mathrm{Bi}(n,p)$$ distribution.

The binomial distribution allows us to model sampling from an essentially infinite population of units in which a proportion $$p$$ of the units have a particular characteristic. If we choose a unit at random from this population, the probability that it has the characteristic is equal to $$p$$, and the probability that it does not have the characteristic is equal to $$1-p$$. If we choose $$n$$ units at random, the number $$X$$ with the characteristic has a binomial distribution: $$X \stackrel{\mathrm{d}}{=} \mathrm{Bi}(n,p)$$.

In practice, we usually do not know the value of the population proportion $$p$$, and we are interested in obtaining an estimate of $$p$$. A single observation $$x$$ of $$X$$ can be used to provide a point estimate of the unknown population proportion $$p$$: the sample proportion $$\frac{x}{n}$$ is an estimate of the population proportion $$p$$. There will be some imprecision associated with a single point estimate, and we would like to quantify this sensibly.

In this module, we discuss the distribution of observations from a binomial distribution to illustrate how it serves as a basis for using a sample proportion to estimate an unknown population proportion $$p$$. By considering the approximate distribution of sample proportions, we can provide a quantification of the uncertainty in an estimate of the population proportion. This is a confidence interval for the unknown population proportion $$p$$.

This provides methods for answering questions like:

• What is our best estimate of the proportion of Australians who plan to vote for Labor in the next federal election?
• What is the uncertainty in this estimate of the proportion of Australians who plan to vote for Labor in the next federal election?
• What is our best estimate of the proportion of physically inactive Australian adults?
• What is the uncertainty in this estimate of the proportion of physically inactive Australian adults?

Next page - Content - Using probability theory to make an inference