Random sampling in finite populations

Consider the two words in the term random sample. As the noun in the phrase suggests, this involves data 'sampled', or taken from, something else. The adjective, 'random', indicates that the mechanism used in obtaining the sample is based on probability, and not on conscious or unconscious preferences.

Random sampling has subtle aspects when considered formally. There are two important cases. The first is a random sample from a finite population of units.

A unit is a single member in a finite population we wish to study. A unit might be a person, animal, plant, school, company or other object. The population is the complete set of units we wish to study, and a census takes measurements on the entire population. A sample is a set of units (a subset of the population) that we take measurements on.

A simple random sample is a random sample selected by a method which ensures that all possible samples, of a given size, are equally likely to be chosen.

Example: Small raffle

Twelve players from a basketball club on an end-of-season trip check into a hotel. One of the twin rooms available has a balcony and a spa; the other rooms are basic. They decide to choose the two players who will get the best room by a simple raffle. Their twelve names are put into a hat and the contents are shaken well. The service manager at the reception desk is asked to draw two names out of the hat.

Assuming that the names in the hat are properly and randomly mixed, each possible pair of names is equally likely to be chosen. So to work out the chance of a particular pair being chosen, we need to find the number of distinct pairs. There are \(\tbinom{12}{2} = 66\) possible pairs from among twelve individuals. So the chance of a particular one of these 66 pairs being chosen is equal to \(\dfrac{1}{66} \approx 0.015\).

In this process, the order of the names is regarded as unimportant. So it does not matter, in the successful pair, which of the two names is drawn first.

The general result is that, for a simple random sample of size \(n\) chosen from a finite population of size \(N\), the number of specific combinations is equal to the number of ways of choosing \(n\) units from \(N\), and this is equal to

\[ \dbinom{N}{n} = \dfrac{N!}{n!(N-n)!} = \dfrac{N \times (N-1) \times (N-2) \times \dots \times (N-n+2) \times (N-n+1)}{n \times (n-1) \times (n-2) \times \dots \times 2 \times 1}. \]

For a simple random sample, all of these combinations are equally likely, so the probability of a specific combination is \(\dfrac{1}{\binom{N}{n}}\).

Example: Tattslotto

In the Saturday evening draw of Tattslotto, six winning balls are drawn at random from 45 balls numbered 1 to 45. There are 8 145 060 different ways of choosing six numbers from 45; that is, \(\tbinom{45}{6} = 8\ 145\ 060\). So the probability of any specific combination of six balls being chosen is \(\dfrac{1}{8\ 145\ 060}\).

An equivalent way to derive the same probability is to think of the balls being chosen in sequence. Consider a specific choice of six balls, such as \(\{3, 4, 20, 37, 40, 45\}\). The probability that the first ball chosen is in this specific set is \(\dfrac{6}{45}\). The conditional probability that the second ball chosen is one of the remaining five balls in the set, given that the first ball chosen was in the set, is equal to \(\dfrac{5}{44}\), and so on. The conditional probability that the sixth ball chosen is in the set, given that the first five chosen were, is equal to \(\dfrac{1}{40}\). Putting all this together using the multiplication theorem (from the module Probability), we obtain the probability of the specific set being chosen:

\[ \Pr(\text{specific set is chosen}) = \dfrac{6}{45} \times \dfrac{5}{44} \times \dfrac{4}{43} \times \dfrac{3}{42} \times \dfrac{2}{41} \times \dfrac{1}{40} = \dfrac{1}{8\ 145\ 060}, \]

as before.

What kind of process is needed so that each possible combination has the same chance of selection? Each week we can watch the physical process that has been designed to try to ensure that a random sample of numbers is selected. The balls are numbered, but they need to be carefully calibrated in terms of their size, shape and weight. The balls are dropped into a transparent barrel and mixed by jets of air blowing into the barrel. After each ball is selected, jets of air mix the remaining balls again.

This process is to ensure sufficient random mixing of 45 balls so that every combination of six balls has the same probability of selection. This has the consequence that every ball in the barrel has the same chance of being selected.

The previous example illustrates an important property of random samples: what makes a sample random is how it is chosen, not what it consists of.

Exercise 1

Consider the mixing of the balls with jets of air in Tattslotto.

  1. Is the selection of the first ball alone a random sample?
  2. If six balls were selected with the initial mixing, but without the re-mixing after each ball was selected, would this be considered a random sample?
  3. What may be the purpose of the re-mixing after each ball is selected?

Next page - Content - Mechanisms for generating random samples