Content

The central limit theorem

We have already described two important properties of the distribution of the sample mean \(\bar{X}\) that are true for any value of the sample size \(n\). These two properties are that \(\mathrm{E}(\bar{X}) = \mu\) and \(\mathrm{var}(\bar{X}) = \dfrac{\sigma^2}{n}\).

There is a third property of the distribution of the sample mean \(\bar{X}\) that does depend on the value of the sample size \(n\). However, this remarkable property does not depend on the shape of the distribution of \(X\), the parent distribution from which the random sample is taken. It is known as the central limit theorem and is stated as follows.

Theorem (Central limit theorem)

For large samples, the distribution of the sample mean is approximately Normal. If we have a random sample of size \(n\) from a parent distribution with mean \(\mu\) and variance \(\sigma^2\), then as \(n\) grows large the distribution of the sample mean \(\bar{X}\) tends to a Normal distribution with mean \(\mu\) and variance \(\dfrac{\sigma^2}{n}\).

The extremely useful implication of this result is that, for large samples, the distribution of the sample mean is approximately Normal.

This is a startling property because there are no restrictions on the shape of the population distribution of \(X\); all we specify are its mean and variance. The population distribution might be a uniform distribution, an exponential distribution or some other shape: a U-shape, a triangular shape, extremely skewed or quite irregular.

Note. For the central limit theorem to apply, we do need the parent distribution to have a mean and variance! There are some strange distributions for which either the variance, or the mean and the variance, do not exist. But we need not worry about such distributions here.

The central limit theorem has a long history and very wide application. It is beyond the scope of the curriculum to provide a proof, but we have already seen empirical evidence of its truth: examples showing the behaviour of the distribution of the sample mean as the sample size \(n\) increases.

As the averages from any shape of distribution tend to have a Normal distribution, provided the sample size is large enough, we do not need information about the parent distribution of the data to describe the properties of the distribution of sample means. Therein lies the power of the central limit theorem, since limited knowledge about the parent distribution is the norm. We have a basis for using the sample mean to make inferences about the population mean, even in the usual situation where we don't know the distribution of \(X\), the random variable we are sampling.

The central limit theorem is the result behind the phenomenon we have seen in the examples in the previous sections. Each time we looked at samples of a large size, the histogram of the sample means was bell-shaped and symmetrically positioned around the mean of the parent distribution.

This important result is used for inference about the unknown population mean \(\mu\); but there is one more step in this process.

Next page - Content - Standardising the sample mean