Content

Random variables

A random variable is a variable whose value is determined by the outcome of a random procedure. The concept of a random procedure was discussed in the module Probability. What makes the variable random is that — unlike the kind of variable we see in a quadratic equation — we cannot say what the observed value of the random variable is until we actually carry out the random procedure.

Example: Tetris

Consider the following example of a random procedure from the module Probability: During a game of Tetris, we observe a sequence of three consecutive pieces.

Each Tetris piece has one of seven possible shapes, which are labelled by the letters \(\mathtt{I}\), \(\mathtt{J}\), \(\mathtt{L}\), \(\mathtt{O}\), \(\mathtt{S}\), \(\mathtt{T}\) and \(\mathtt{Z}\). So in this random procedure, we can observe a sequence such as \(\mathtt{JLL}\), \(\mathtt{ZOS}\), \(\mathtt{ZSZ}\), \(\mathtt{III}\) and so on.

Based on this random procedure, we may define a number of random variables. For example:

Define \(X\) to be the number of occurrences of `\(\mathtt{Z}\)' in a sequence of three pieces. Then \(X\) can take the value 0, 1, 2 or 3.
Define \(Y\) to be the number of different shapes in a sequence of three pieces. Then \(Y\) can take the value 1, 2 or 3.

These are not the only random variables that could be defined in this context.

This example illustrates that a random variable takes a numerical value in a specific case, when the random procedure is carried out.

A convention of notation is that random variables are denoted by capital letters, usually near the end of the alphabet.

Example: Five people born in 1995

Consider another example of a random procedure from the module Probability: Five babies born in 1995 are followed up over their lives, and major health and milestone events are recorded.

This example is only vaguely described, and would be more tightly defined in practice. But we can see, again, that a number of random variables could be defined:

Let \(U_i\) be the total number of times that individual \(i\) moves residence up to age 18. Then \(U_i\) can take values \(0, 1, 2, \dots\).
Let \(V_i\) be the total number of mobile phones owned by individual \(i\) up to age 18. The possible values for \(V_i\) are \(0, 1, 2, \dots\), which incidentally are the same as those for \(U_i\).
Let \(W\) be the average height of the five people at age 18. Then the value of \(W\) must be positive, but there is no obvious upper bound. The common practice in such cases is to say that the possible values are \(W > 0\); we will assign extremely low probabilities to large values.
Let \(T_i\) be the total time spent on Facebook by individual \(i\) up to age 18. Then \(T_i\) in this case is limited by the total time span being considered. If we measure \(T_i\) in years, then \(0 \leq T_i \leq 18\); again, values anywhere near the logical maximum of 18 years will be assigned essentially zero probability.

The random variables given in the previous example are of two distinct types, which are handled in different ways:

A discrete random variable takes values confined to a range of separate or `discrete' values. (More formally, a discrete random variable takes either a finite number of values or a countably infinite number of values.) In the example, the first two random variables \(U_i\) and \(V_i\) are counts: they can only take non-negative integer values. (A person cannot have moved residence 2.3 times by age 18, nor can a person have owned 9.6 mobile phones.) A random variable based on a count is an example of a discrete random variable.
A continuous random variable can take any value in an interval. In the example, the third and fourth random variables \(W\) and \(T_i\) are continuous random variables.

This module concerns discrete random variables. The module Continuous probability distributions deals with continuous random variables.

It is important to see that the definition of a random variable needs a specification of what is observed or recorded. In some situations, this is essentially implied; but it is always required implicitly or explicitly.

Example

A fair coin is spun vertically on a flat surface. (This example comes from an exercise in the module Probability.) Here are two related random variables:

Let \(X\) be the number of heads showing when the coin comes to rest. Then \(X\) takes the value 0 if the coin finishes up `tails', or 1 if the coin finishes up `heads'.
Let \(Y\) be the time between the commencement of the spin and the coin coming to rest, measured in seconds.

Here \(X\) is discrete and \(Y\) is continuous.

A special case of a discrete random variable is one that can take only a finite number of values. We call this a simple random variable. Since \(X\) in the previous example can only take values 0 and 1, it is a simple random variable.

Next page - Content - Discrete random variables: general ideas