Assumed knowledge

The content of the modules:

Motivation

The module Random sampling introduces sampling from a binomial distribution. Underlying the binomial distribution are Bernoulli trials and Bernoulli random variables. A Bernoulli random variable is a discrete random variable that takes the value 1 with probability \(p\) and the value 0 with probability \(1-p\). If we have a random sample of \(n\) observations on a Bernoulli random variable, then the sum of the observations \(X\) has a binomial distribution with parameters \(n\) and \(p\). A random sample of \(n\) Bernoulli observations thus gives a single observation from the \(\mathrm{Bi}(n,p)\) distribution.

The binomial distribution allows us to model sampling from an essentially infinite population of units in which a proportion \(p\) of the units have a particular characteristic. If we choose a unit at random from this population, the probability that it has the characteristic is equal to \(p\), and the probability that it does not have the characteristic is equal to \(1-p\). If we choose \(n\) units at random, the number \(X\) with the characteristic has a binomial distribution: \(X \stackrel{\mathrm{d}}{=} \mathrm{Bi}(n,p)\).

In practice, we usually do not know the value of the population proportion \(p\), and we are interested in obtaining an estimate of \(p\). A single observation \(x\) of \(X\) can be used to provide a point estimate of the unknown population proportion \(p\): the sample proportion \(\frac{x}{n}\) is an estimate of the population proportion \(p\). There will be some imprecision associated with a single point estimate, and we would like to quantify this sensibly.

In this module, we discuss the distribution of observations from a binomial distribution to illustrate how it serves as a basis for using a sample proportion to estimate an unknown population proportion \(p\). By considering the approximate distribution of sample proportions, we can provide a quantification of the uncertainty in an estimate of the population proportion. This is a confidence interval for the unknown population proportion \(p\).

This provides methods for answering questions like:

Next page - Content - Using probability theory to make an inference