Student resources Variability in samples page 1

Student resources icon

Variability in samples

We generally collect data from a sample to answer questions concerning the whole population. It is important to remember that different results can be obtained from a different sample.

Categorical data

Observations that fall into a number of distinct categories are known as categorical data. These occur everywhere in everyday life. For example:

gender
hair colour
place of birth
suburb of residence.

If we have census data, we can simply report percentages or proportions for a country. If we have sample data that are representative of some general situation, we use it to estimate proportions for the more general situation. However, for categorical data, we are interested in relative frequencies or proportions of the different categories.

Relative frequency of samples

Histogram showing results from selecting a sample of 20, 6 red, 9 blue, 5 green

A large bin contains 200 red, blue and green balls, thoroughly mixed. We wish to estimate the number of red balls in the bin.

Twenty balls are drawn and their colour recorded. The categories are the colours. The population is the bin of balls, and the random sample of each ball consists of the 20 balls withdrawn.

The column graph gives a picture of the results.
One way of describing this data is through relative frequency or proportion.
Relative frequency (proportion) = \( \dfrac{\text {frequency}}{\text {size of data set}}\), often expressed as a percentage.

In this case, the sample data set has 20 items.

Category	Red	Blue	Green
Frequency	6	9	5
Relative frequency as a percentage	30%	45%	25%

Further investigation would be needed before we could make any prediction concerning the proportion of red balls in the bin.