1 Probability theory

The pimary goal of a statistician is to make sense of the world by analysing data. Typically, however, they will only be able to obtain data on a sample from the full population of interest, which in order to avoid bias should be a random sample. For this reason, any course in statistics should begin with a study of probability theory to help understand and manage the uncertainty that arises from this.

Key definitions

In probability theory, an experiment for which all of the possible outcomes can be defined, and yet the actual outcome of any given trial cannot be known in advance, is called a random experiment. The set of all possible outcomes is called the sample space of the random experiment. Probability is assigned to individual outcomes or to sets of outcomes, called events. Probability itself is defined as the long-run relative frequency of an outcome or event occurring.

For example, in the context of rolling a fair, six-sided die:

  • the procedure of rolling the die and observed the result is a random experiment

  • each roll of the die may be referred to as a trial

  • the sample space consists of a list of all the the possible outcomes: \(\{1,2,3,4,5,6\}\)

  • the probability of a \(5\) is \(\frac{1}{6}\) since in the long-run we would expect it to occur \(\frac{1}{6}\) of the time

  • a possible event of interest may be that the outcome is prime: \(\{2,3,5\}\)

Notation

The probability that a fair, six-sided die lands on 5 is one-in-six, or \(\frac{1}{6}\). Probability notation can be used to write this sentence far more efficiently as:

\[P(\text{five})=\frac{1}{6}\]