C Symbols, formulas, statistics and parameters

C.1 Symbols and standard errors

  • The following table lists the statistics used to estimate unknown population parameters.

  • The sampling distribution is given for each statistic.

  • When the sampling distribution is approximately normally distributed, under certain statistical validity conditions, this is indicated by .

  • The value of the mean of the sampling distribution (the sampling mean) is:

    • unknown, for confidence intervals.
    • assumed to be the value given in the null hypothesis, for hypothesis tests.
Sampling distribution
Parameter, and
Standard
Normal
Statistic Mean Std error Normal? Ref.
Proportion \(\hat{p}\) \(p\) CI: \(\displaystyle \sqrt{\frac{ \hat{p} \times (1 - \hat{p})}{n}}\) CI: Ch. 23
Proportion \(\hat{p}\) \(p\) HT: \(\displaystyle \sqrt{\frac{ p \times (1 - p)}{n}}\) HT: Ch. 30
Mean \(\bar{x}\) \(\mu\) \(\displaystyle \frac{s}{\sqrt{n}}\) CI: Ch. 24
\(\displaystyle \frac{s}{\sqrt{n}}\) HT: Ch. 31
Mean difference \(\bar{d}\) \(\mu_d\) \(\displaystyle \frac{s_d}{\sqrt{n}}\) CI: Ch. 26
\(\displaystyle \frac{s_d}{\sqrt{n}}\) HT: Ch. 33
Difference between means \(\bar{x}_1 - \bar{x}_2\) \(\mu_1 - \mu_2\) \(\displaystyle \sqrt{\text{s.e.}(\bar{x}_1) + \text{s.e.}(\bar{x}_2)}\) CI: Ch. 27
\(\displaystyle \sqrt{\text{s.e.}(\bar{x}_1) + \text{s.e.}(\bar{x}_2)}\) HT: Ch. 34
Diff. between proportions \(\hat{p}_1 - \hat{p}_2\) \(p_1 - p_2\) \(\displaystyle \sqrt{\text{s.e.}(\hat{p}_1) + \text{s.e.}(\hat{p}_2)}\) CI: Ch. 28
Diff. between proportions \(\hat{p}_1 - \hat{p}_2\) \(p_1 - p_2\) \(\displaystyle \sqrt{\text{s.e.}(\hat{p}) + \text{s.e.}(\hat{p})}\) for common proportion \(\hat{p}\) HT: Ch. 35
Odds ratio Sample OR Pop. OR (Not given) CI: Ch. 28
(Not given) HT: Ch. 35
Correlation \(r\) (Not given)
(Not given) HT: Ch. 37
Regression: slope \(b_1\) \(\beta_1\) \(\text{s.e.}(b_1)\) (value from software) CI: Ch. 38
\(\text{s.e.}(b_1)\) (value from software) HT: Ch. 38
Regression: intercept \(b_0\) \(\beta_0\) \(\text{s.e.}(b_0)\) (value from software) CI: Ch. 38
\(\text{s.e.}(b_0)\) (value from software) HT: Ch. 38

C.2 Confidence intervals

For statistics that have an approximate normal distributions, confidence intervals have the form \[ \text{statistic} \pm ( \text{multiplier} \times \text{s.e.}(\text{statistic})). \]

Notes:

  • The multiplier is approximately \(2\) to create an approximate \(95\)% CI (based on the \(68\)--\(95\)--\(99.7\) rule).
  • The quantity '\(\text{multiplier} \times \text{s.e.}(\text{statistic})\)' is called the margin of error.
  • When the sampling distribution for the statistic does not have an approximate normal distribution (e.g., for odds ratios and correlation coefficients), this formula does not apply.

C.3 Hypothesis testing

For statistics that have an approximate normal distributions, the test statistic has the form: \[ \frac{\text{statistic} - \text{parameter}}{\text{s.e.}(\text{statistic})}. \] This quantity is a \(t\)-score for most hypothesis tests in this book. However, if the standard error does not involve any sample estimate, this quantity is a \(z\)-score. This is the case for a hypothesis test for a proportion.

Notes:

  • Since \(t\)-scores are approximately like \(z\)-scores (Sect. 32.4), the \(68\)--\(95\)--\(99.7\) rule can be used to approximate \(P\)-values.
  • When the sampling distribution for the statistic does not have an approximate normal distribution (e.g., for odds ratios and correlation coefficients), this formula does not apply.
  • A hypothesis test about odds ratios uses a \(\chi^2\) test statistic, whose value is like a \(z\)-score with a value of \[ \sqrt{\frac{\chi^2}{\text{df}}}, \] where \(\text{df}\) is the 'degrees of freedom', as given in the software output.

C.4 Sample size estimation

All of the following formula compute the approximate minimum sample size needed to produce a \(95\)% confidence interval with a specified margin of error.

  • To estimate the sample size needed for estimating a proportion (Sect. 29.3): \[ n = \frac{1}{(\text{Margin of error})^2}. \]

  • To estimate the sample size needed for estimating a mean (Sect. 29.4): \[ n = \left( \frac{2\times s}{\text{Margin of error}}\right)^2. \]

  • To estimate the sample size needed for estimating a mean difference (Sect. 29.5): \[ n = \left( \frac{2 \times s_d}{\text{Margin of error}}\right)^2. \]

  • To estimate the sample size needed for estimating the difference between two means (Sect. 29.6): \[ n = 2\times \left( \frac{2 \times s}{\text{Margin of error}}\right)^2 \] for each sample, where \(s\) is an estimate of the common standard deviation in the population for both groups. This formula assumes:

    • the sample size in each group is the same; and
    • the standard deviation in each group is the same.

Notes:

  • In sample size calculations, always round up the sample size found from the above formulas.

C.5 Other formulas

  • To calculate \(z\)-scores (Sect. 21.4): \[ z = \frac{\text{value of variable} - \text{mean of the distribution of the variable}}{\text{standard deviation of the distribution of the variable}}. \]
  • \(t\)-scores are like \(z\)-scores, except for small sample size.
  • When the 'variable' is a sample estimate, the 'standard deviation of the distribution' is a standard error.
  • The unstandardizing formula (Sect. 21.8): \(x = \mu + (z\times \sigma)\).
  • The interquartile range (IQR): \(Q_3 - Q_1\), where \(Q_1\) and \(Q_3\) are the first and third quartiles respectively (or, equivalently, the \(25\)th and \(75\)th percentiles).
  • The degrees of freedom for a two-way table of counts: \[ \text{df} = (\text{number of columns of data} - 1) \times (\text{number of rows of data} - 1). \]
  • The regression equation in the sample: \(\hat{y} = b_0 + b_1 x\), where \(b_0\) is the sample intercept and \(b_1\) is the sample slope.
  • The regression equation in the population: \(\hat{y} = \beta_0 + \beta_1 x\), where \(\beta_0\) is the intercept and \(\beta_1\) is the slope.

C.6 Other symbols used

Symbol Meaning Reference
\(s\) Sample standard deviaton Sect. 11.6.2
\(\sigma\) Population standard deviation Sect. 11.6.2
\(s_d\) Sample standard deviaton of differences Sect. 11.6.2
\(\sigma_d\) Population standard deviation of differences Sect. 11.6.2
\(R^2\) R-squared Sect. 16.4.2
\(H_0\) Null hypothesis Sect. 32.2
\(H_1\) Alternative hypothesis Sect. 32.2
df Degrees of freedom Sect. 35.3.2
CI Confidence interval Chap. 25
s.e. Standard error Def. 20.3
\(n\) Sample size
\(\chi^2\) The chi-squared test statistic Sect. 35.3.2