Content
Populations and sample frames
In order to obtain a random sample from a defined population, we need to be able to describe the population of interest so that we can design a method to select a sample from the population. The set of units that describe the universe from which we can take a sample is called the sample frame. The sample frame is the 'practical' population: what we actually sample from. Even though it is often difficult to achieve this, it is important to make it match, as closely as possible, the real population of interest. In large populations this can be particularly challenging.
Consider, for example, how we could obtain a sample of Australian businesses with over 20 employees. We may be able to obtain lists from employer or business organisations. If we relied on such lists for our sample frame, we would have concerns such as:
- How do businesses with more than 20 employees get on the list? Do they have to be on it? If not, which businesses are typically omitted?
- How current is the information? How often is it updated?
- What is the quality of the information? If we are going to use the list for the purposes of contacting businesses selected in our sample, are the contact details accurately recorded?
As we will see, problems with the sample frame can seriously undermine the integrity of a sample.
Example: Estimating the unemployment rate (sample frame)
The sample frame used by the Australian Bureau of Statistics for estimating the unemployment rate describes dwellings in Australia by including three components: private dwellings, discrete indigenous communities, and non-private dwellings such as retirement homes and motels. The sample frame divides Australia into many small geographic areas. At the time of the Australian Census (every five years), a description of the dwellings within each of these small geographic areas is recorded. For the unemployment-rate survey, some of the small areas are sampled and then a subset of dwellings within each small area is sampled. Once a small area has been selected to be included, a check is made of the description of dwellings to make sure it is up-to-date.
The Australian Bureau of Statistics has a well-defined population and access to a vast sample frame. Each month the unemployment rate can be estimated from the sample selected from this frame.
Example: Jury duty
How are people selected for jury duty in the state of Victoria? In the last few years, over 60 000 people have been summoned (each year) to attend the courts for jury service.
Potential jurors are selected randomly from the Victorian electoral roll. In 2010, there were around 3.5 million voters enrolled in Victoria. Enrolment is compulsory for Australian citizens (and qualified British subjects) aged 18 years or over who have lived in Victoria for at least one month at their current address. Over the last few years, about 10% of the summoned potential jurors have been empanelled.
Exercise 3
Consider the process of obtaining a random sample of potential jurors in Victoria.
- Assume that you want to take a sample of eligible voters. Would the electoral roll provide a perfect sample frame? Would there be any potential biases?
- Assume that you had electronic access to the electoral roll and could take a simple random sample. What is the (approximate) probability of being selected as a potential juror in one year?
- Could someone be called up for jury duty twice in one year? What would you need to know to determine this?
- Assume that you only had access to a hard copy of the electoral roll, organised by electorate. What might be a practical way to take an approximately random sample? Would your method be a simple random sample?