Content - Calculating confidence intervals

Answers to exercises

Exercise 1

The following table gives \(\mathrm{sd}(\hat{P})\) for various values of \(p\) and \(n\), to two decimal places. Note the symmetry in the table: \(\mathrm{sd}(\hat{P})\) is the same for \(p=\theta\) and \(p=1-\theta\).

\(n\)	\(p = 0.1\)	\(p = 0.3\)	\(p = 0.5\)	\(p = 0.7\)	\(p = 0.9\)
10	0.09	0.14	0.16	0.14	0.09
50	0.04	0.06	0.07	0.06	0.04
100	0.03	0.05	0.05	0.05	0.03

Exercise 2

We can think of the \(m\) intervals as a sequence of \(m\) independent Bernoulli trials, with each trial having probability of success \(p=0.95\). Then \(Y\) is the number of successes in the \(m\) trials, where a success is counted whenever the confidence interval includes the unknown parameter value. So \(Y\) has a binomial distribution with parameters \(n=m\) and \(p=0.95\), that is, \(Y \stackrel{\mathrm{d}}{=} \mathrm{Bi}(m,0.95)\).
\(\mathrm{E}(Y) = 0.95 m\).
Now assume \(m = 100\). Then \(Y \stackrel{\mathrm{d}}{=} \mathrm{Bi}(100,0.95)\).
1. The chance that exactly 95 intervals include the parameter is \(\Pr(Y = 95) = 0.18\). This can be obtained in Excel using \(\sf \text{=BINOM.DIST(95, 100, 0.95, FALSE)}\).
2. The chance that at least 95 intervals include the parameter is \(\Pr(Y \geq 95) = 0.62\). This can be obtained in Excel by summing the probabilities for values of \(Y\) from 95 to 100. But there is also a more direct method. To obtain \(\Pr(Y = y)\) in Excel, we use \(\sf \text{FALSE}\) as the fourth argument. If we use \(\sf \text{TRUE}\) instead, the result is the cumulative probability \(\Pr(Y \leq y)\). Note that \(\Pr(Y \geq 95) = 1-\Pr(Y \leq 94)\). We can find \(\Pr(Y \leq 94)\) in Excel using \(\sf \text{=BINOM.DIST(94, 100, 0.95, TRUE)}\). We obtain \(\Pr(Y \geq 95) = 1-\Pr(Y \leq 94) = 1-0.384 = 0.62\).

Exercise 3

A 100% confidence interval would mean that, in the long run, 100% of confidence intervals would include the true parameter value. In the case of estimating a proportion, we can be certain that the true proportion is between 0 and 1; hence, the 100% confidence interval is \((0, 1)\). That is the only way we could guarantee that every single confidence interval includes the true value. Of course, this is not a useful confidence interval in any practical sense. This reminds us, however, why we choose a confidence level less than 100%.

Exercise 4

We use the general result that \(\mathrm{E}(aX+b) = a\,\mathrm{E}(X) + b\), for constants \(a\) and \(b\). Note that \(p\) and \(\sqrt{\dfrac{p(1-p)}{n}}\) are constants, and so \[ \mathrm{E}\Biggl(\dfrac{\hat{P}-p}{\sqrt{\dfrac{1}{n}p(1-p)}}\Biggr) = \dfrac{\mathrm{E}(\hat{P}-p)}{\sqrt{\dfrac{1}{n}p(1-p)}} = \dfrac{\mathrm{E}(\hat{P})-p}{\sqrt{\dfrac{1}{n}p(1-p)}} = \dfrac{p-p}{\sqrt{\dfrac{1}{n}p(1-p)}} = 0. \]
Using the general result \(\mathrm{var}(aX+b) = a^2\,\mathrm{var}(X)\), we have \[ \mathrm{var}\Biggl(\dfrac{\hat{P}-p}{\sqrt{\dfrac{1}{n}p(1-p)}}\Biggr) = \dfrac{\mathrm{var}(\hat{P}-p)}{\dfrac{1}{n}p(1-p)} = \dfrac{\mathrm{var}(\hat{P})}{\dfrac{1}{n}p(1-p)} = \dfrac{\dfrac{1}{n}p(1-p)}{\dfrac{1}{n}p(1-p)} = 1. \] The variance is 1, and so the standard deviation is 1.

Exercise 5

First we note that these data satisfy the guideline for the Normal approximation to be adequate: \(x=20\) and \(n=180\), so both \(x\) and \(n-x\) are greater than 10.

If the advertised claim is true, the proportion of winning wrappers Casey can expect to get (in a long-run average) is \(\dfrac{1}{6} = 0.167\). So the expected number of winning wrappers in 180 purchases is \(\dfrac{180}{6} = 30\).
The proportion of winning wrappers in Casey's sample is \(\dfrac{20}{180} = \dfrac{1}{9} = 0.111\).
We have \(\hat{p} = \dfrac{1}{9} = 0.111\), so \[ 1.96 \sqrt{\dfrac{\hat{p}(1-\hat{p})}{n}} = 1.96 \sqrt{\dfrac{0.111(1-0.111)}{180}} = 0.0459. \] Thus the approximate 95% confidence interval is \(0.1111 \pm 0.0459\), or \((0.0652, 0.1570)\). In percentage terms, the confidence interval is \(6.52\%\) to \(15.70\%\).
The 95% confidence interval for the true proportion is \((0.065, 0.157)\); these are values for the true proportion that are consistent with Casey's observation of 20 winning wrappers in a sample of 180 wrappers. The expected proportion of 0.167, according to the advertised claim, is outside the confidence interval; it is greater than the upper bound. Casey's sample of Venus bars provides some basis for being suspicious.
The method used for finding the confidence interval assumes that Casey's sample of Venus bar wrappers is a random sample from the population of Venus bars produced for the promotion. Of course, if Casey buy Venus bars from shops with some old, pre-promotion stock, he would not expect to get \(\dfrac{1}{6}\) winners. We assume that the 180 Bernoulli trials are independent; that is, Casey's success (or failure) in finding a winning wrapper on one day is not related to his success (or failure) on another day. In assessing this assumption, we need to think about the distribution of winning wrappers and Casey's buying patterns. For example: Are bars with winning wrappers randomly mixed among all bars? Is there a limit on the number of winners per box? Does Casey always buy from the same place?

Exercise 6

The bounds of the 99% confidence interval will be further from the point estimate than the bounds of the 95% confidence interval. Your estimate for the lower bound of the 99% confidence interval should be less than 0.065, and your estimate for the upper bound should be greater than 0.157.
The value of the factor \(z\) from the standard Normal distribution for a 99% confidence interval is 2.576. (Reading it from figure 21 gives 2.6.) The ratio of the values of \(z\) for the 99% and 95% confidence intervals is \(\dfrac{2.576}{1.96} = 1.3\). Hence, the margin of error for the 99% confidence interval will be 1.3 times greater than the margin of error for the 95% confidence interval. It will be about 0.06, making the 99% confidence interval about \((0.05, 0.17)\).
We have \(\hat{p} = \dfrac{1}{9} = 0.111\), so \[ 2.576 \sqrt{\dfrac{\hat{p}(1-\hat{p})}{n}} = 2.576 \sqrt{\dfrac{0.111(1-0.111)}{180}} = 0.0603. \] Hence, the 99% confidence interval is \(0.1111 \pm 0.0603\), or \((0.0508, 0.1714)\). In percentage terms, the confidence interval is \(5.08\%\) to \(17.14\%\).
The 99% confidence interval includes the claimed true proportion of \(16.7\%\).

Exercise 7

Based on the margin of error provided, the approximate 95% confidence interval is \(0.56 \pm 0.026\), or (0.534, 0.586); in percentage terms, it is \((53.4\%, 58.6\%)\).
For a 95% confidence interval, the margin of error is \(1.96 \sqrt{\dfrac{\hat{p}(1-\hat{p})}{n}}\). The maximum margin of error occurs when \(\hat{p} = 0.5\). We have \begin{alignat*}{2} &&\quad 1.96 \sqrt{\dfrac{0.5(1-0.5)}{n}} &= 0.026 \\ &\implies & \sqrt{\dfrac{0.5 \times 0.5}{n}} &= \dfrac{0.026}{1.96} \\ &\implies & \dfrac{0.25}{n} &= \Bigl(\dfrac{0.026}{1.96}\Bigr)^2 \\ &\implies & n = 0.25\,\Bigl(\dfrac{1.96}{0.026}\Bigr)^2 &= 1421 \quad\text{(to the nearest whole number).} \end{alignat*} Hence, the sample size is 1421.
Even if the sample size remains constant for the different outcomes that are reported, the margin of error will vary because it depends on \(\hat{p}\). The reporting of the uncertainty in the survey results is simplified by reporting the margin of error that is the maximum for the sample size involved.
Based on the margin of error provided, the approximate 95% confidence interval is \(0.40 \pm 0.026\), or \((0.374, 0.426)\). This confidence interval is conservative (wider than it should be), because it is calculated using the maximum margin of error. The maximum margin of error arises when \(\hat{p} = 0.5\), but in this case \(\hat{p} = 0.4\). So the actual margin of error, based on the Normal approximation, is less than 0.026. Not by much, however, as \(1.96 \sqrt{\dfrac{0.4 \times 0.6}{1421}} = 0.025\).

Exercise 8

Since the circle has radius 1, it has area equal to \(k\). As one quarter of the circle is in the unit square, the proportion of the area of the square that is covered by the circle is equal to \(p = \dfrac{k}{4}\).

Because you are simulating data, there is no unique answer. If you can set up the Excel spreadsheet so it calculates everything (including the point estimate and approximate 95% confidence interval) based on formulas typed in cells, then hitting the F9 key will produce a new simulation and a different confidence interval; you can use the F9 key many times to see how often your confidence interval includes the true value.

The true value being estimated is \(\dfrac{\pi}{4} = 0.7854\). The following table gives the point estimates and approximate 95% confidence intervals from five independent simulations.

Point estimate \(\hat{p}\)	Approximate 95% CI
0.7843	(0.7762, 0.7924)
0.7898	(0.7818, 0.7978)
0.7847	(0.7766, 0.7928)
0.7928	(0.7849, 0.8007)
0.7847	(0.7766, 0.7928)

As it happens, these 95% confidence intervals all include the true value.

An inference for \(\dfrac{k}{4}\) can be converted into an inference for \(k\) by multiplying through by 4. For example, for the first result in the table, the point estimate for \(k\) is \(4 \times 0.7843 = 3.137\), and the approximate 95% confidence interval for \(k\) is \((3.105, 3.169)\).

This is a different method for approximating \(\pi\) from the one used by Archimedes…