The probability axioms

We now consider probabilities of events. What values can a probability take, and what rules govern probabilities?

Some notation: We have used capital letters \(A, B, \dots\) for events, and we have observed that probability can be thought of as a function that has an event as its argument. The domain is the collection of all events, that is, all possible subsets of \(\mathcal{E}\). By analogy with usual function notation, such as \(f(x)\), we use the notation \(\Pr(A)\) to denoted the probability of the event \(A\).

There are just three basic rules or axioms for probabilities, from which several other important rules can be derived. These fundamental rules were first spelled out in a formal way by the Russian mathematician Andrey Kolmogorov.

The three axioms of probability
  1. \(\Pr(A) \geq 0\), for each event \(A\).
  2. \(\Pr(\mathcal{E}) = 1\).
  3. If events \(A\) and \(B\) are mutually exclusive, that is, if \(A \cap B = \varnothing\), then \[ \Pr(A \cup B) = \Pr(A) + \Pr(B). \]

The first axiom fits with our experience of measuring things, like length and area. Just as the lowest possible value for a length is zero, the lowest possible value for a probability is zero. Any other choice for the minimum numerical value of a probability would not work.

The second axiom says that the probability that something will happen is one. Another way of thinking about the second axiom is this: When the random process is carried out, we are certain that one of the outcomes in the event space \(\mathcal{E}\) will occur.

The third axiom determines the way we work out probabilities of mutually exclusive events. The axiom says that, if \(A\) and \(B\) are mutually exclusive, then the probability that at least one of them occurs is the sum of the two individual probabilities. While this seems very compelling, it cannot be proved; mathematically, it must be assumed.

Taken together, the three axioms imply that probabilities must be between zero and one (inclusive): \(0 \leq \Pr(A) \leq 1\), for any event \(A\). We will prove this in the next section.

The choice of this numerical range for probabilities fits with relative frequencies from data, which are always in this range. In earlier years, students learned how relative frequencies are estimates of probabilities. We may estimate the probability of being left-handed by finding the relative frequency, or proportion, of left-handed people in a random sample of people. Any relative frequency of this sort must be a fraction between zero and one; it cannot be negative, and it cannot be greater than one. This basic observation fits with the range for probabilities themselves: \(0 \leq \Pr(A) \leq 1\).

The probability scale is between zero and one, but in ordinary discourse about probabilities the percentage scale is often used. In the media, in particular, we may read that `the chance of the government being re-elected is regarded as no better than 40%'. In such usage, the scale for probability is 0% to 100%. There is no real difficulty with this, provided it is very clear which scale is being used. This becomes particularly important for small probabilities: if we say that the chance of an outcome is \(0.5\%\), this is the same as saying that the probability (on the usual zero-to-one scale) is 0.005. It is important to be alert to the potential confusion here.

Technical note. In this module, we only consider examples where the event space \(\mathcal{E}\) is finite or countably infinite. The more general axiomatic treatment of probability, which also covers examples where \(\mathcal{E}\) is uncountable, is very similar to our treatment here. But in the general situation, we do not insist that every subset of \(\mathcal{E}\) is an event (and therefore has an associated probability).

Next page - Content - Useful properties of probability