## Assumed knowledge

• A basic understanding of sampling, as covered by the series of TIMES modules Data investigation and interpretation (Years F–10), particularly the Year 8 module.
• The content of the modules:
• A familiarity with the idea of a random variable: A random variable $$X$$ is a quantitative outcome of a random procedure. Random here refers to the inherent uncertainty of the outcome, rather than to something haphazard. Random variables can be discrete or continuous. The probability distribution of a discrete random variable $$X$$ specifies the probabilities for possible values of $$X$$. The probability density function describes the probability distribution for continuous random variables.

## Motivation

• What does it mean to take a 'random' sample?
• Can exit polls predict the outcome of an election?
• How do we know if our sample is random?
• What happens if we don't take a random sample?
• Why is the Australian Census only conducted every five years?

In earlier years, students have seen different ways in which a 'sample' of data might arise or be used: surveying a sample of people, taking measurements on a sample of other objects (animals, trees, schools, companies and so on), conducting an experiment on a sample involving the random allocation of its members to different groups, and making observations on different groups or samples. This is covered by the series of modules Data investigation and interpretation (Years F–10).

The context here for learning about random samples is to understand how they serve as a basis for relying on the sample data to provide quantitative information about the population from which the sample was taken. This is because, very often, the questions we ask are general in nature:

• Who do Australians prefer for Prime Minister?
• Who do Australian women prefer for Prime Minister?
• Is the proportion of Australian women who prefer candidate J for Prime Minister higher than the proportion of men with the same preference?
• What are the vitamin D levels of Australian newborns?
• How much 'screen time' do preschool children typically have per week?
• Does drug and alcohol use in adolescence predict success in adulthood?
• Will primary school children using 'daily computer-assisted practice' master arithmetic operations more quickly than those without access to the program?
• Does acupuncture have a stronger effect on the severity of migraine headaches than standard drug treatments?

In these examples, it is impractical (impossible!) to find an exact answer, that is, to find the exact value of the quantity of interest in the population (such as the proportion of women preferring candidate J as Prime Minister). Instead, we can obtain an estimate based on a sample.

There are many reasons for using samples. Most often, the cost in time and effort prohibits gathering information from the entire population of interest. It can also be easier to ensure the information is high quality in a sample. Importantly, with high-quality data collection methods and appropriate ways of selecting a sample, we can obtain accurate information about a population. In some cases, we want to infer a property of a population based on a sample from the population: the proportion of Australians who prefer candidate J. In other cases, we may wish to make a comparison of the properties of two populations: the proportions of Australian men and women preferring candidate J.

Populations are rarely static; it may not be possible to capture an entire population because it extends into the future. In asking about vitamin D levels in Australian newborns, it is likely that we want to draw a general conclusion that applies to newborns born today as well as those born tomorrow and in the future. Often we envisage that the conclusion drawn will be relevant to all humans, including humans in the future. This involves making an assumption about the stability of the world and its patterns.

Next page - Content - Random sampling in finite populations