Content
Mechanisms for generating random samples
In some situations, generating a random sample can be straightforward. Tattslotto provides an exemplar. We have an accurate list of all the units in the population we wish to sample from (the 45 balls), and we use a random mechanism (thorough and chaotic mixing) in sampling. However, the population here is very small.
Tattslotto uses a physical randomising device. Other similar devices are referred to in the module Probability: for example, the use of marbles with numbers on them corresponding to dates for conscripting young men for service in Vietnam in the 1960s and 1970s.
Let's say you wish to take a random sample of 30 students from the 300 Year 11 and 12 students in your school. It is cumbersome to put 300 names in a container and shake them vigorously, and however we do it, we may have reservations about the effectiveness of the chaotic mixing, without something designed specifically for the purpose. It is more practical to use a random number generator in a computer.
As students are likely to have access to a computer with Microsoft Excel installed, we describe a method that uses Excel.
First obtain a list of all the students in a single column of an Excel worksheet. In the next column, enter the Excel formula \(\sf \text =RAND()\) in the cell next to the first student name, and copy this formula down the column for 300 rows. This will generate, for each student, a random number between 0 and 1 from a uniform distribution. (Uniform distributions are discussed in the module Continuous probability distribution)
The \(\sf \text RAND()\) function is known (in Excel terms) as 'volatile', which means that the values will change every time an action is carried out on the worksheet. So it is important that, once a random number is generated for each student, the values (rather than the formulas) are saved. This can be done by copying the created numbers and using the 'Paste Special' function to paste the 'values'.
So now we have the list of 300 names, and alongside each name is a random number from the \(\mathrm{U}(0,1)\) distribution, which means it is equally likely to be any number in the interval \((0,1)\). To take a 10% sample of students we find, for example, the students with the lowest 10% of the random numbers. This can be done by sorting the name and random-number columns according to the values of the random numbers. It would be just as reasonable to find the students with the top 10% of random numbers; however, it is vital to decide which 10% will constitute the sample before assigning the random numbers.
It is sometimes said or thought that, if every unit in the population has the same chance of being chosen, then the sample is a simple random sample. This is not true, as the following exercise shows.
Exercise 2
Consider sampling the 300 students, in a different way, to obtain a sample of 150 students. Alongside each name, the numbers 1 and 2 are listed alternately (\(1,2,1,2,\dots\)). A fair coin is tossed. If the outcome is heads, the sample is taken to be the 150 students with a '1' next to their name. If the outcome is tails, the sample is the 150 students with a '2' next to their name.
- Using this method, how many possible samples of 150 can be obtained?
- Does every student have the same chance of selection?
- Is this a simple random sample?
Sampling using groups
Practical sampling problems often involve large, sometimes complex, populations. Understanding the structure of the population can help in the design of a practical random sampling method. Sometimes the population can be divided into natural groups, or clusters. We may be able to get a good sample by looking at the clusters, and it can be more efficient and less costly than taking a simple random sample. For example, it is a lot more convenient to survey all students at 20 schools, rather than a simple random sample of 1000 school children. This is a random sample of clusters.
In other cases, we may wish to ensure that we sample from different groups, or strata, in our population. For example, we might wish to survey school children from different sectors (government schools, private schools, etc.). If we obtain simple random samples in each of a few strata of the population, this involves random samples within the strata.
Random selection is a vital element no matter what aspects of the structure of the population we wish to exploit in sampling. Physical randomisation devices might work well in simple situations, but most often computers are used to select random samples.
Example: Estimating the unemployment rate
Every month, the Australian Bureau of Statistics releases the national unemployment rate. It is estimated from a survey of the civilian Australian population aged 15 years or older. The survey samples dwellings, and questions are asked of the individuals in a sampled dwelling. The sampling of dwellings by the Australian Bureau of Statistics used to estimate the unemployment rate is done in several stages. Within metropolitan regions, for example, a geographic area is randomly sampled first. The geographic areas are divided into 'blocks' — groups of dwellings with boundaries like roads, parks and creeks — and a number of blocks are randomly selected. Finally, a set of dwellings is sampled within each block.
The Australian Bureau of Statistics uses random sampling at each stage of selection; a procedure like the selection of a simple random sample in Excel is used at each stage.