The mean and variance of \(\bar{X}\)

We have seen that sample means can vary from sample to sample, and hence that the sample mean \(\bar{X}\) has a distribution. The way to think about this distribution is to imagine an endless sequence of samples taken from a single population under identical conditions. From this imagined sequence, we could work out each sample mean, and then look at the distribution of them. This thought experiment helps us to understand what is meant by 'the distribution of \(\bar{X}\)'. We have approximated this thought experiment in the previous section (figure 5), using only 100 samples. This is a long way short of an endless sequence, but illustrates the idea.

What is the mean of this distribution? And its variance?

The visual impression we get from the example of 100 samples of study scores in the previous section (figure 5) is that the mean of the distribution of the sample mean \(\bar{X}\) is equal to 30. So the distribution of \(\bar{X}\) is centred around the mean of the underlying parent distribution, \(\mu\). We will prove that this is true in general.

Since \(\bar{X}\) comes from a random sample on \(X\), it is hardly surprising that the properties of the distribution of \(\bar{X}\) are related to the distribution of \(X\).

To obtain the mean and variance of \(\bar{X}\), we use two results covered in the module Binomial distribution , which we restate here:

  1. For \(n\) random variables \(X_1, X_2, \dots, X_n\), we have \[ \mathrm{E}(X_1 + X_2 + \dots + X_n) = \mathrm{E}(X_1) + \mathrm{E}(X_2) + \dots + \mathrm{E}(X_n). \]
  2. If \(Y_1, Y_2, \dots, Y_n\) are independent random variables, then \[ \mathrm{var}(Y_1 + Y_2+ \dots + Y_n) = \mathrm{var}(Y_1) + \mathrm{var}(Y_2) + \dots + \mathrm{var}(Y_n). \]
Theorem (Mean of the sample mean)

For a random sample of size \(n\) on \(X\), where \(\mathrm{E}(X) = \mu\), we have

\[ \mathrm{E}(\bar{X}) = \mu. \]

Each random variable \(X_i\) in the random sample has the same distribution as \(X\), and so \(\mathrm{E}(X_i) = \mu\). Also, recall that if \(Y = aV+b\), then \(\mathrm{E}(Y) = a\,\mathrm{E}(V) + b\). Hence,

\begin{align*} \mathrm{E}(\bar{X}) &= \mathrm{E}\Bigl(\dfrac{X_1 + X_2 + \dots + X_n}{n}\Bigr) \\\\ &= \dfrac{1}{n}\, \mathrm{E}(X_1 + X_2 + \dots + X_n) \\\\ &= \dfrac{1}{n}\, \bigl(\mu + \mu + \dots + \mu\bigr) \\\\ &= \dfrac{1}{n}\, n\mu \\\\ &= \mu. \end{align*}


This is an important result. It tells us that, on average, the sample mean is neither too low nor too high; its expected value is the population mean \(\mu\). The mean of the distribution of the sample mean is \(\mu\). We may feel that this result is intuitively compelling or, at least, unsurprising. But it is important nonetheless. It tells us that using the sample mean to estimate \(\mu\) has the virtue of being an unbiased method: on average, we will be right.

The formula for the variance of \(\bar{X}\) is not obvious. It is clear, however, that the variance of \(\bar{X}\) is considerably smaller than the variance of \(X\) itself; that is, \(\mathrm{var}(\bar{X}) \ll \mathrm{var}(X)\). The distribution of \(\bar{X}\) is a lot narrower than that of \(X\). This is exemplified by our example in the previous section (see figure 5). This is a very useful phenomenon when it comes to statistical inference about \(\mu\), as we shall see.

Theorem (Variance of the sample mean)

For a random sample of size \(n\) on \(X\), where \(\mathrm{var}(X) = \sigma^2\), we have

\[ \mathrm{var}(\bar{X}) = \dfrac{\sigma^2}{n}. \]

First, note that \(\mathrm{var}(X_i) = \sigma^2\) and that, in a random sample, \(X_1, X_2, \dots, X_n\) are mutually independent. Also, if \(Y = aV+b\), then \(\mathrm{var}(Y) = a^2\,\mathrm{var}(V)\). Hence,

\begin{align*} \mathrm{var}(\bar{X}) &= \mathrm{var}\Bigl(\dfrac{X_1 + X_2 + \dots + X_n}{n}\Bigr) \\\\ &= \Bigl(\dfrac{1}{n}\Bigr)^2\, \mathrm{var}(X_1 + X_2 + \dots + X_n) \\\\ &= \Bigl(\dfrac{1}{n}\Bigr)^2\, \bigl(\sigma^2 + \sigma^2 + \dots + \sigma^2\bigr) \qquad\qquad (\text{since the } X_i\text{'s are independent}) \\\\ &= \Bigl(\dfrac{1}{n}\Bigr)^2\, n\sigma^2 \\\\ &= \dfrac{\sigma^2}{n}. \end{align*}


A corollary of this result is that

\[ \mathrm{sd}(\bar{X}) = \dfrac{\sigma}{\sqrt{n}}. \]

This is a more tangible and relevant version of the result, since the standard deviation of \(\bar{X}\) is in the same units as \(X\) and \(\mu\).

Think about the implications of \(\sqrt{n}\) being in the denominator of \(\mathrm{sd}(\bar{X})\). This tells us that the spread of the distribution of the sample mean is smaller for larger values of \(n\). Since the distribution is centred around \(\mu\), this implies that for large values of \(n\) it is very likely that \(\bar{X}\) will be close to \(\mu\).


Consider the study-score example illustrated in the previous section, in which random samples of size \(n=10\) are obtained from \(\mathrm{N}(30,7^2)\). In this case:

It is important to understand that these results for the mean, variance and standard deviation of \(\bar{X}\) do not require the distribution of \(X\) to have any particular form or shape; all that is required is for the parent distribution to have a mean \(\mu\) and a variance \(\sigma^2\). Further, the results are true for all values of the sample size \(n\).

In summary, for a random sample of \(n\) observations on a random variable \(X\) with mean \(\mu\) and variance \(\sigma^2\):

Next page - Content - Sampling from symmetric distributions