Mean and variance

Let \(X \stackrel{\mathrm{d}}{=} \mathrm{Bi}(n,p)\). We now consider the mean and variance of \(X\).

The mean \(\mu_X\) is \(np\), a result we will shortly derive. Note that this value for the mean is intuitively compelling. If we are told that 8% of the population is left-handed, then on average how many left-handers will there be in a sample of 100? We expect, on average, 8% of 100, which is 8. In a sample of 200, we expect, on average, 8% of 200, which is 16. The formula we are applying here is \(n \times p\), where \(n\) is the sample size and \(p\) the proportion of the population with the binary characteristic (in this example, being left-handed).

We will use the following two general results without proving them: one for the mean of a sum of random variables, and the other for the variance of a sum of independent random variables.

  1. For \(n\) random variables \(X_1,X_2,\dots,X_n\), we have \[ \mathrm{E}(X_1 + X_2 + \dots + X_n) = \mathrm{E}(X_1) + \mathrm{E}(X_2) + \dots + \mathrm{E}(X_n). \]
  2. If \(Y_1,Y_2,\dots,Y_n\) are independent random variables, then \[ \mathrm{var}(Y_1 + Y_2 + \dots + Y_n) = \mathrm{var}(Y_1) + \mathrm{var}(Y_2) + \dots + \mathrm{var}(Y_n). \]

Note the important difference in the conditions of these two results. The mean of the sum equals the sum of the means, without any conditions on the random variables. The corresponding result for the variance, however, requires that the random variables are independent.

Recall that a binomial random variable is defined to be the number of successes in a sequence of \(n\) independent Bernoulli trials, each with probability of success \(p\). Let \(B_i\) be the number of successes on the \(i\)th trial. Then \(B_i\) is a Bernoulli random variable with parameter \(p\). Counting the total number of successes over \(n\) trials is equivalent to summing the \(B_i\)'s. In the section Bernoulli trials, we showed that \(\mathrm{E}(B_i) = p\). Since \(X = B_1 + B_2 + \dots + B_n\), it follows that

\begin{align*} \mathrm{E}(X) &= \mathrm{E}(B_1) + \mathrm{E}(B_2) + \dots + \mathrm{E}(B_n) \\ &= p + p + \dots + p \\ &= np. \end{align*}

We may also establish this result in a far less elegant way, using the probability function of \(X\). This requires a result involving combinations: for \(a \geq b \geq 1\), we have

\begin{align*} \dbinom{a}{b} &= \dfrac{a!}{(a-b)!b!} \\ &= \dfrac{a(a-1)\dotsm(a-b+1)}{b!} \\ &= \dfrac{a}{b}\, \dfrac{(a-1)(a-2)\dotsm(a-b+1)}{(b-1)!} \\ &= \dfrac{a}{b}\, \dbinom{a-1}{b-1}. \end{align*}

We can use this result to calculate \(\mathrm{E}(X)\) directly:

\begin{align*} \mathrm{E}(X) &= \sum_{x=0}^n x\, p_X(x) &&(\text{definition of expected value}) \\ &= \sum_{x=0}^n x\, \dbinom{n}{x}\, p^x (1-p)^{n-x} \\ &= np \sum_{x=1}^n \dbinom{n-1}{x-1}\, p^{x-1} (1-p)^{(n-1)-(x-1)} &&(\text{using the result above}) \\ &= np \sum_{y=0}^m \dbinom{m}{y}\, p^y (1-p)^{m-y} &&(\text{where } m=n-1 \text{ and } y=x-1) \\ &= np \times 1 &&\text{(by the binomial theorem)} \\ &= np. \end{align*}

Now we consider the variance of \(X\). It is possible, but cumbersome, to derive the variance directly, using the definition \(\mathrm{var}(X) = \mathrm{E}[(X - \mu_X)^2]\) and a further result involving combinations. Instead, we will apply the general result for the variance of a sum of independent random variables. As before, we note that \(X = B_1 + B_2 + \dots + B_n\), where \(B_1,B_2,\dots,B_n\) are independent Bernoulli random variables with parameter \(p\). In the section Bernoulli trials, we showed that \(\mathrm{var}(B_i) = p(1-p)\). Hence,

\begin{align*} \mathrm{var}(X) &= \mathrm{var}(B_1) + \mathrm{var}(B_2) + \dots + \mathrm{var}(B_n) \\ &= p(1-p) + p(1-p) + \dots + p(1-p) \\ &= np(1-p). \end{align*} It follows that, for \(X \stackrel{\mathrm{d}}{=} \mathrm{Bi}(n,p)\), the standard deviation of \(X\) is given by \[ \mathrm{sd}(X) = \sqrt{np(1-p)}. \]

Note that the spread of the distribution of \(X\), reflected in the formulas for \(\mathrm{var}(X)\) and \(\mathrm{sd}(X)\), is the same for \(X \stackrel{\mathrm{d}}{=} \mathrm{Bi}(n,p)\) and \(X \stackrel{\mathrm{d}}{=} \mathrm{Bi}(n,1-p)\). This agrees with the pattern observed in figure 2: the distribution for \(p = \theta\) is a mirror image of the distribution for \(p = 1-\theta\), and therefore has the same spread.

Exercise 4
Consider the nine binomial distributions represented in figure 2.
  1. Determine the mean and standard deviation of \(X\) in each case.
  2. Among the nine distributions, when is the standard deviation smallest? When is it largest?
Exercise 5
Suppose that \(X \stackrel{\mathrm{d}}{=} \mathrm{Bi}(n,p)\).
  1. Sketch the graph of the variance of \(X\) as a function of \(p\).
  2. Using calculus, or otherwise, show that the variance is largest when \(p = 0.5\).
  3. Find the variance and standard deviation of \(X\) when \(p = 0.5\).
Exercise 6

Let \(0 < p < 1\) and suppose that \(X \stackrel{\mathrm{d}}{=} \mathrm{Bi}(n,p)\). Consider the following claim:

As \(n\) tends to infinity, the largest value of \(p_X(x)\) tends to zero.

Is this true? Explain.

Next page - Answers to exercises