## Content

### Independence

The concept of independence of events plays an important role throughout probability and statistics. The most helpful way to think about independent events is as follows.

We first consider two events \(A\) and \(B\), and assume that \(\Pr(A) \ne 0\) and \(\Pr(B) \ne 0\). The events \(A\) and \(B\) are **independent** if

This equation says that the conditional probability of \(A\), given \(B\), is the same as the unconditional probability of \(A\). In other words, given that we know that \(B\) has occurred, the probability of \(A\) is unaffected.

Using the rule for conditional probability, we see that the events \(A\) and \(B\) are independent if and only if

\[ \Pr(A \cap B) = \Pr(A) \times \Pr(B). \]This equation gives us a useful alternative characterisation of independence in the case of two events. The symmetry of this equation shows that independence is not a directional relationship: the events \(A\) and \(B\) are independent if \(\Pr(B|A) = \Pr(B)\). So, for independent events \(A\) and \(B\), whether \(B\) occurs has no effect on the probability that \(A\) occurs, and similarly \(A\) has no effect on the probability that \(B\) occurs.

If events \(A\) and \(B\) are not independent, we say that they are **dependent**. This does not necessarily mean that they are directly causally related; it just means that they are not independent in the formal sense defined here.

Events that are physically independent are also independent in the mathematical sense. Mind you, physical independence is quite a subtle matter (keeping in mind such phenomena as the `butterfly effect' in chaos theory, the notion that a butterfly flapping its wings in Brazil could lead to a tornado in Texas). But we often take physical independence as the working model for events separated by sufficient time and/or space. We may also assume that events are independent based on our consideration of the independence of the random processes involved in the events. Very commonly, we assume that observations made on different individuals in a study are independent, because the individuals themselves are separate and unrelated.

For example, if \(A\) = "my train is late" and \(B\) = "the clue for 1 across in today's cryptic crossword involves an anagram", we would usually regard \(A\) and \(B\) as independent. However, if \(C\) = "it is raining this morning", then it may well be that \(A\) and \(C\) are not independent, while \(B\) and \(C\) are independent.

While physical independence implies mathematical independence, the converse is not true. Events that are part of the same random procedure and not obviously physically independent may turn out to obey the defining relationship for mathematical independence, as the following example demonstrates.

#### Example

Suppose a fair die is rolled twice. Consider the following two events:

- \(A\) = "a three is obtained on the second roll"
- \(B\) = "the sum of the two numbers obtained is less than or equal to 4".

In this case, somewhat surprisingly, we can show that \(\Pr(B|A) = \Pr(B)\). Thus \(A\) and \(B\) are mathematically independent, even though at face value they seem related.

To see this, we write the elementary events in this random process as \((x,y)\), with \(x\) the result of the first roll and \(y\) the result of the second roll. Then:

- \(A = \{(1,3), (2,3), (3,3), (4,3), (5,3), (6,3)\}\)
- \(B = \{(1,1), (1,2), (1,3), (2,1), (2,2), (3,1)\}\).

There are 36 elementary events, each with probability \(\dfrac{1}{36}\), by symmetry. So \(\Pr(A) = \dfrac{6}{36} = \dfrac{1}{6}\) and \(\Pr(B) = \dfrac{6}{36} = \dfrac{1}{6}\). We also have \(A \cap B = \{(1,3)\}\) and so \(\Pr(A \cap B) = \dfrac{1}{36}\). Hence,

\[ \Pr(B|A) = \dfrac{\Pr(A \cap B)}{\Pr(A)} = \dfrac{\dfrac{1}{36}}{\dfrac{1}{6}} = \dfrac{1}{6} = \Pr(B). \]Thus \(A\) and \(B\) are independent. Alternatively, we can check independence by calculating \(\Pr(A \cap B) = \dfrac{1}{36} = \dfrac{1}{6} \times \dfrac{1}{6} = \Pr(A) \times \Pr(B)\).

Such examples are rare in practice and somewhat artificial. The key point is that physical independence implies mathematical independence.

Exercise 9

For each of the following situations, discuss whether or not the events \(A\) and \(B\) should be regarded as independent.

- \(A\) = "Powerball number is 23 this week"\newline \(B\) = "Powerball number is 23 next week"
- \(A\) = "Powerball number is 23 this week"\newline \(B\) = "Powerball number is 24 this week"
- \(A\) = "maximum temperature in my location is at least 33 \(^\circ\)C today"\newline \(B\) = "maximum temperature in my location is at least 33 \(^\circ\)C six months from today"
- \(A\) = "the tenth coin toss in a sequence of tosses of a fair coin is a head"\newline \(B\) = "the first nine tosses in the sequence each result in a head"

The previous exercise illustrates some important points. It is usually assumed that games of chance involving gambling entail independence. Some people may suspect that these games (such as lotteries and roulette) are rigged, and there are indeed some famous examples of lottery scandals. But the regulations of such games are designed to create an environment of randomness that involves independence between different instances of the same game. So knowing the Powerball outcome this week should not change the probability distribution for next week; each ball should be equally likely as usual.

On the other hand, the events ``quot;Powerball is 23 this week" and ``quot;Powerball is 24 this week" are incompatible: on any single Powerball draw there can only be one Powerball number. So these events are mutually exclusive (their intersection is empty) and the probability that they both occur is zero.

Mutually exclusive events are definitely **not** independent; they are dependent. If two events are mutually exclusive, then given that one occurs, the conditional probability of the other event is zero. That is, for mutually exclusive events \(C\) and \(D\) (both having non-zero probability), we have \(\Pr(C|D) = 0 \neq \Pr(C)\). Alternatively, we may observe that \(\Pr(C \cap D) = 0 \neq \Pr(C)\Pr(D)\).

Students often confuse the two concepts, but **mutually exclusive** and **independent** are quite different. The following table serves as a reminder.

Mutually exclusive events | Independent event |
---|---|

If one occurs, the other cannot. | Knowing that one occurs does not affect the probability of the other occurring. |

\(A \cap B = \varnothing\) and so \(\Pr(A \cap B) = 0\) | \(\Pr(A \cap B) = \Pr(A)\Pr(B)\) |

What about the maximum-temperature example from exercise 9? This is a great deal more subtle. It might seem that two days six months apart, even at the same location, are sufficiently distant in time for the maximum temperatures to be independent. But for Adelaide (say), a maximum temperature of at least 33 \(^\circ\)C could be a fairly clear indication that the day is not in winter. This would make the day six months from now not in summer, which might alter the probability of that day having a maximum temperature of at least 33 \(^\circ\)C, compared to not knowing that today's temperature is at least 33 \(^\circ\)C.

And the sequence of coin tosses? This rather depends on what we assume about the coin and the tossing mechanism. As indicated in exercises 5 and 6, there is more than one way to toss a coin. Even for the standard method — delivering the spin by a sudden flick of the thumb with the coin positioned on the index finger — successive tosses may not be truly independent if the coin tosser has learned to control the toss.

Just as we may use relative frequencies as estimates of probabilities in general, we may use conditional relative frequencies as estimates of conditional probabilities, and we may examine these to evaluate whether independence is suggested or not.

Recall the example of 302 incidents in which school children left their bag at the bus stop briefly to run home and get something. In 38 of these incidents, the bag was not there when they returned, and we used the relative frequency \(\dfrac{38}{302} \approx 0.126\), or \(12.6\%\), to estimate the chance that a bag will not be there in these circumstances.

In 29 of these incidents, the bag left at the bus stop was a Crumpler bag. Among these 29 incidents, the Crumpler bag was no longer at the bus stop upon return in 10 cases. So the relative frequency of the bag being gone, given that it was a Crumpler bag, was \(\dfrac{10}{29} \approx 0.345\), or \(34.5\%\) on the percentage scale. The difference between the conditional relative frequency of 0.345 and the unconditional one of 0.126 suggests that the bag being gone upon return is **not** independent of the bag being a Crumpler bag.

We next prove an important property of independence.

###### Property

If events \(A\) and \(B\) are independent, then the events \(A\) and \(B'\) are independent.

###### Proof

- Suppose that \(A\) and \(B\) are independent. Then \(\Pr(A \cap B) = \Pr(A) \times \Pr(B)\). Therefore \begin{align*} \Pr(A \cap B') &= \Pr(A) - \Pr(A \cap B)\\ &= \Pr(A) - \Pr(A) \times \Pr(B) \qquad\text{since \(A\) and \(B\) are independent}\\ &= \Pr(A) \bigl(1-\Pr(B)\bigr)\\ &= \Pr(A) \times \Pr(B'). \end{align*} Hence \(A\) and \(B'\) are independent. (It follows by symmetry that \(A'\) and \(B\) are independent, and therefore that \(A'\) and \(B'\) are independent.)

\(\Box\)

Exercise 10

Suppose that two separate experiments with fair dice are carried out. In the first experiment, a die is rolled once. If \(X\) is the outcome of the roll, then \(Y = X+1\) is recorded. In the second experiment, two dice are rolled. The sum \(U\) of the two outcomes is recorded, and the maximum \(V\) of the two outcomes is recorded.

- What are the possible values of \(Y\), \(U\) and \(V\)?
- Which is more likely: \(U > Y\) or \(U < Y\)?
- Which is more likely: \(V > Y\) or \(V < Y\)?

#### Independence for more than two events

Our discussion of independence has so far been limited to two events. The extension to an arbitrary number of events is important.

If the events \(A_1, A_2, \dots, A_n\) are mutually independent, then

\[ \Pr(A_1 \cap A_2 \cap \dots \cap A_n) = \Pr(A_1)\Pr(A_2) \dotsb \Pr(A_n). \qquad (*) \]This is a necessary condition for mutual independence, but it is not sufficient. As we might reasonably expect, mutual independence is actually characterised by statements about conditional probabilities.

Essentially, the idea is the natural extension of the case of two events: We say that the events \(A_1, A_2, \dots, A_n\) are **mutually independent** if, for each event \(A_i\), all of the possible **conditional** probabilities involving the other events are equal to the unconditional probability \(\Pr(A_i)\). Informally, this means that regardless of what happens among the other events, the probability of \(A_i\) is unchanged; and this must be true for each event \(A_i\), where \(i = 1,2,\dots,n\).

We can express this definition formally as follows: Events \(A_1, A_2, \dots, A_n\) are mutually independent if

\[ \Pr(A_i) = \Pr(A_i \mid A_{j_1} \cap A_{j_2} \cap \dots \cap A_{j_m}), \] for all \(i\) and for every possible combination \(j_1, j_2, \dots, j_m\) such that \(j_k \neq i\).What happens, for example, when \(n=3\)? This definition says that events \(A_1\), \(A_2\) and \(A_3\) are mutually independent if

\[ \Pr(A_1) = \Pr(A_1|A_2) = \Pr(A_1|A_3) = \Pr(A_1|A_2 \cap A_3) \] and similarly for \(A_2\) and \(A_3\).The equation \((*)\) above follows from this definition. The reason that \((*)\) is not sufficient to indicate mutual independence, however, is that it is possible for \((*)\) to be satisfied, while at the same time \(A_1\) and \(A_2\) are not independent of each other.

#### Example: Dice games

A Flemish gentleman called Chevalier de Méré played games of chance using dice in around 1650. He played one game where a fair die is rolled four times; it is assumed that the outcomes are independent. What is the chance of getting at least one six? De Méré reasoned as follows. On any single roll, the probability of getting a six is \(\dfrac{1}{6}\). There are four rolls, so the probability of getting a six at some stage is \(4 \times \dfrac{1}{6} = \dfrac{2}{3}\). The flaw in this reasoning should be obvious: What if there were seven rolls?

Another game he played was to roll two dice 24 times, and consider the chance of getting a double six at least once. He reasoned the same way for this game. On any single roll, the chance of a double six is \(\dfrac{1}{36}\). There are 24 rolls, so the probability of getting a double six at some stage in the sequence of 24 rolls is \(24 \times \dfrac{1}{36} = \dfrac{2}{3}\).

Because of the second calculation in particular, he was betting on this outcome occurring, when playing the game \dots\ and losing in the long run. Why? Rather than persisting doggedly with his strategy, he posed this question to Blaise Pascal, who correctly analysed his chances, as shown below. This was the beginning of the systematic study of probability theory.

In the second game, the chance of a double six on any single roll of the two dice is \(\dfrac{1}{36}\), which we obtain by using the independence of the outcomes on the two dice:

\begin{align*} \Pr(\text{double six}) &= \Pr(\text{six on first die}) \times \Pr(\text{six on second die})\\ &= \dfrac{1}{6} \times \dfrac{1}{6} = \dfrac{1}{36}. \end{align*}To work out the probability of at least one double six in 24 rolls, we are going to apply a general rule which we have met in a number of contexts already.

The probability of an event occurring at least once in a sequence of \(n\) repetitions is equal to one minus the probability that it does not occur. If \(X\) is the number of times the event occurs, then

\[ \Pr(X \geq 1) = 1 - \Pr(X = 0). \]This rule is an application of property 4 (from the section Useful properties of probability), since ``quot;at least one" and ``quot;none" are complementary events. Note that the rule does not require that the events in the sequence of repetitions are independent (although when they are, the calculations are easier).

The application to de Méré's second game is this: On any given roll of the two dice, the probability of **not** obtaining a double six is \(\dfrac{35}{36}\). If the 24 rolls are mutually independent, the probability that each of the 24 rolls does not result in a double six can be obtained as the product of the individual probabilities, using equation \((*)\), and is therefore equal to \(\bigl(\dfrac{35}{36}\bigr)^{24} \approx 0.509\). This means that the probability of at least one roll resulting in a double six is approximately \(1 - 0.509 = 0.491\). The fact that this probability is less than \(\dfrac{1}{2}\) (and considerably less than the value \(\dfrac{2}{3}\) calculated by de Méré) explains why he was losing money on his bets.

Exercise 11

- A boat offers tours to see dolphins in a partially enclosed bay. The probability of seeing dolphins on a trip is 0.7. Assuming independence between trips with regard to the sighting of dolphins, what is the probability of not seeing dolphins:
- on two successive trips?
- on sevens trips in succession?

- A machine has four components which fail independently, with probabilities of failure 0.1, 0.01, 0.01 and 0.005. Calculate the probability of the machine failing if:
- all components have to fail for the machine to fail
- any single component failing leads to the machine failing.

- Opening the building of an organisation in the morning of a working day is a responsibility shared between six people, each of whom has a key. The chances that they arrive at the building before the required time are, respectively, 0.95, 0.90, 0.80, 0.75, 0.50 and 0.10. Do you think it is reasonable to assume that their arrival times are mutually independent? Assuming they are, find the chance that the building is opened on time.
- In the assessment of the safety of nuclear reactors, calculations such as the following have been made.

In any year, for one reactor, the chance of a large loss-of-coolant accident is estimated to be \(3 \times 10^{-4}\). The probability of the failure of the required safety functions is \(2 \times 10^{-3}\). Therefore the chance of reactor meltdown via this mode is \(6 \times 10^{-7}\).

What do you think of this argument?

The next example illustrates the somewhat strange phenomenon of events that are **not** mutually independent but nevertheless satisfy equation \((*)\).

#### Example

Consider the random procedure of tossing a fair coin three times. Define the events:

- \(A\) = "at least two heads"
- \(B\) = "the last two tosses give the same result"
- \(C\) = "the first two results are heads or the last two results are tails".

Then, using an obvious notation:

- \(A = \{\text{HHT}, \text{HTH}, \text{THH}, \text{HHH}\}\)
- \(B = \{\text{HHH}, \text{THH}, \text{HTT}, \text{TTT}\}\)
- \(C = \{\text{HHH}, \text{HHT}, \text{HTT}, \text{TTT}\}\).

Thus \(A \cap B \cap C = \{\text{HHH}\}\). Assuming independence of the tosses, there are eight elementary outcomes (two for each of the three tosses), all equally probable, so the probability of each of them equals \(\dfrac{1}{8}\). Hence

\[ \Pr(A \cap B \cap C) = \tfrac{1}{8} = \bigl(\tfrac{1}{2}\bigr)^3 = \Pr(A) \Pr(B) \Pr(C). \]So equation \((*)\) is satisfied. However, \(B \cap C = \{\text{HHH}, \text{HTT}, \text{TTT}\}\) and hence

\[ \Pr(B \cap C) = \tfrac{3}{8} \neq \tfrac{1}{4} = \bigl(\tfrac{1}{2}\bigr)^2 = \Pr(B) \Pr(C). \]It follows that \(B\) and \(C\) are not independent, and hence the events \(A\), \(B\) and \(C\) are not mutually independent.

Examples like the previous one are not really of practical importance. Of much greater importance is the fact that, if events are physically independent, then they are mutually independent, and so then equation \((*)\) is true. This is the result that is useful in practice.