Content

The sample proportion as an estimator of \(p\)

Even without using any ideas from probability or distribution theory, it seems compelling that the sample proportion should tell us something about the population proportion. If we have a random sample from the population, the sample is representative of the population in an 'expected' sense. So we should be able to use the sample proportion as an estimate of the population proportion.

Assume that \(X \stackrel{\mathrm{d}}{=} \mathrm{Bi}(n,p)\). We define the sample proportion to be \(\hat{P} = \dfrac{X}{n}\). It is a suitable name, because \(\hat{P}\) reflects the proportion in the sample with the characteristic of interest. Once we obtain an actual observation \(x\) of the random variable \(X\), we have an actual observation \(\hat{p} = \dfrac{x}{n}\) of the sample proportion.

As there is a distinction to be made between the random variable \(\hat{P}\) and its corresponding observed value \(\hat{p}\), we refer to the random variable as the estimator \(\hat{P}\), and the observed value as the estimate \(\hat{p}\); note the use of upper and lower case.

More specifically, the observed value \(\hat{p}\) is referred to as a point estimate of \(p\).

Example: Survey of voters

Suppose we obtain a random sample of 500 voters, and we find that 227 prefer Labor. The observed sample proportion preferring Labor is \(\frac{227}{500} = 0.454\), and we say that 0.454 is a point estimate of the unknown population proportion preferring Labor.

In this example:

\(p\) is the proportion of all Australian voters who prefer Labor
\(n = 500\) is the sample size
the random variable \(X\) is the number of voters who prefer Labor in a random sample of 500 voters
the random variable \(\hat{P} = \frac{X}{500}\) is the proportion of voters who prefer Labor in a random sample of 500 voters
\(x = 227\) is an observation of \(X\)
\(\hat{p} = \frac{227}{500} = 0.454\) is the corresponding observation of \(\hat{P}\).

In the previous example, we would be very lucky if the true population proportion turned out to be 0.454. It is much more probable that this value is different from the true population proportion, because samples vary. After all, even when we know the population proportion \(p\), we do not (and should not!) expect the sample proportion to be exactly equal to \(p\). For example, if we toss a fair coin 30 times, then obtaining exactly 15 heads is not guaranteed at all, even if it is one of the more likely outcomes.

This discussion is reminding us that the sample proportion \(\hat{P}\) is actually a random variable; it varies from one sample to the next. In the next section, we explore this important fact in some detail.

Next page - Content - The sample proportion as a random variable