2 Three worlds

Three domains („three worlds”) of our statistics course are:

  1. Descriptive statistics (first semester)

  2. Probability theory (some elements were covered in the previous semester, but more will be introduced now)

  3. Statistical inference (the core of statistics and the main focus of this semester)

Descriptive statistics deals with methods for measuring and summarizing information about data sets.

Probability theory is a branch of mathematics where random variables, which take values depending on chance, play a key role.

Statistical inference applies probability theory to draw conclusions about a population or data-generating process based on sample data.

Understanding these three aspects helps to navigate different areas of statistical analysis. It's important to be aware in which world one is operating at any given moment. Let's illustrate this with an example of calculating the mean:

  1. Mean in descriptive statistics. Example: ten students of the statistics course received the following grades: 4.0, 5.0, 4.5, 3.0, 3.0, 3.5, 5.0, 4.0, 4.5, 4.0. We calculate the average grade.

  2. Mean in probability theory. Example: we tossed a biased coin, which lands on heads with probability of 0.6 and tails with probability 0.4. If we get heads, we win 1 dollar; if tails, we lose 1 dollar. What is our expected profit per toss?

In this case, the mean is called the expected value of the random variable.

  1. Mean in statistical inference. Example: We want to estimate the average number of children per family in our city. We randomly select 100 families and compute the sample mean (\(\bar{x}=1.97\)). Based on this, we estimate the population mean, providing a confidence interval. For instance, at a confidence level of \(1-\alpha=0.95\), we conclude that the average number of children per family (\(\mu\)) lies between 1.61 and 2.33.

Similar examples can be provided for other measures such as standard deviation (see below), median, quartiles or percentiles.

  1. Standard deviation in descriptive statistics – Example: In July 2022, the Gdańsk District Examination Board published results of the eighth-grade exam. According to the report, the standard deviation of points obtained by students in Polish is 18 percentage points, in mathematics it is 29 percentage points, and in English it is 31 percentage points. As you can see, the dispersion of results was the greatest in the case of English.

  2. Standard deviaton of a random variable – Example: I assume that the earnings on Apple (AAPL) stock in the next month will average 2.41% with a standard deviation of 8.75 percentage points. In my model, I assume that the earnings are normally distributed with these parameters.

  3. Standard deviation in the statistical inference – example: based on a sample of 20 people, using bootstrapping, we estimate that the population standard deviation of height in the population of adult men is between 7.5 cm and 10.4 cm.

2.1 Questions

Question 2.1 Which of the three worlds are we in (or perhaps in some fourth world) when we:

  1. compute the average number of siblings among the students who attend the lecture?

  2. determine the casino's average edge on a one-dollar bet based on the rules of roulette?

  3. try to estimate the sensitivity of Covid-19 tests?

  4. try to estimate the share of Lotto players who win at least a few PLN in a single draw?