20 Nonparametric tests

The tests presented below, along with the chi-square test described in Chapter 18, are the most popular tests classified under the group known as nonparametric tests.

Unlike parametric tests (such as those presented in Chapters 14-17 and 19), nonparametric tests have more relaxed assumptions. In particular, they do not require the assumption of a normal distribution in the population. However, nonparametric tests usually have lower power compared to their parametric counterparts.

20.1 Runs test

The runs test is a simple test that checks whether dichotomous data (represented as a sequence of zeros and ones) are arranged randomly.

The null hypothesis in this test states that the data are arranged randomly. The alternative hypothesis suggests that the order of results is not random.

Randomness is checked by counting the number of runs. A run is a sequence of consecutive identical values (either all zeros or all ones). For example, in the sequence 0001100010, the runs are 000, 11, 000, 1, and 0.

The test statistic for this test is:

\[z = \frac{r - \mu_r - \text{sgn}(r - \mu_r)/2}{\sigma_r},\]

where:

\[\mu_r = \frac{2 n_0 n_1}{n} + 1,\]

\[\sigma_r = \sqrt{\frac{2 n_0 n_1 (2 n_0 n_1 - n)}{n^2 (n - 1)}},\]

\(r\) is the number of runs,

\(n_0\) is the number of zeros,

\(n_1\) is the number of ones,

\(n = n_0+n_1\) is the total number of elements in the sequence (zeros or ones).

The test is applied for large samples (a directional threshold of \(n \geq 30\) can be assumed), and the test statistic \(z\) then approximately follows a standard normal distribution.

20.2 Mann-Whitney test

The Mann-Whitney test (also known as the Wilcoxon-Mann-Whitney test) is a nonparametric equivalent of the parametric two-sample mean test (see 16).

Assumptions: Independent random samples are drawn from two populations.

Hypotheses: H₀: The distributions in both populations are identical, meaning the mean rank is the same in both populations. H_A: The mean ranks differ between the populations. A one-sided alternative hypothesis is also possible: the ranks are systematically lower or higher in one of the populations.

Test statistic: The test statistic \(U\) is determined based on ranks and accounts for tied ranks (when multiple observations have the same value). The calculation of the test statistic is quite complex and is typically performed using statistical software.

Effect size: A useful measure of effect size in the Mann-Whitney test is AUC (short for "area under the ROC curve"—a term from signal processing). AUC can be interpreted as follows: if one observation is randomly selected from group 1 and another from group 2, then the probability that the observation from group 1 has a higher rank is precisely the AUC. When there are tied ranks, AUC is the probability that the observation from group 1 has a higher rank plus half the probability that the ranks are tied. AUC can be calculated without drawing the ROC curve, particularly if the \(U\) statistic has already been computed:

\[AUC = \frac{U}{n_1n_2}\]

20.3 Wilcoxon signed-rank test

The Wilcoxon signed-rank test is the nonparametric counterpart of the parametric paired sample mean difference test (see 16.4).

Assumptions: A random sample of paired observations. The differences between observations have a symmetric distribution (in the population) and can be ranked.

Hypotheses: H₀: The median of differences in the population is 0; H_A: The median of differences in the population is not 0. One-sided alternative hypotheses are also possible.

Test statistic: The test statistic is derived based on the ranks of the observations. For large samples, the test statistic can be transformed into a \(z\) statistic, which approximately follows a normal distribution.

20.4 Kruskal–Wallis test

The Kruskal–Wallis test is the nonparametric equivalent of analysis of variance (ANOVA, Chapter 19).

Assumptions: Independent random samples are drawn from multiple (\(r\)) populations.

Hypotheses: H₀: The probability distributions are identical across all \(r\) populations. H_A: The probability distributions are not identical in all populations.

Test statistic: It is based on the between-group variability of ranks in the samples. The p-value can be obtained from the chi-square distribution with \(r-1\) degrees of freedom (right-tailed test).

20.5 Templates

Nonparametric tests — Google spreadsheet

Nonparametric tests — Excel template

# Runs test
# Example data
vHT <- "HHTHTTTHHTHTTHHTHTHHTHTHHHTTHTHTHTTHHTTTTHHTHHTTHTHTTHTHHHT"

# Transforming into a vector of 0s and 1s
v01 <- 1*(unlist(strsplit(vHT, ""))=="H")

# Test
DescTools::RunsTest(v01)

## 
##  Runs Test for Randomness
## 
## data:  v01
## z = 1.8413, runs = 38, m = 29, n = 30, p-value = 0.06557
## alternative hypothesis: true number of runs is not equal the expected number

# Mann-Whitney test
# Data (example):
p1<-c(24, 25, 21, 22, 23, 18, 17, 28, 24, 27, 21, 23)
p2<-c(20, 23, 21, 25, 18, 17, 18, 24, 20, 24, 23, 19)

# Test
# with parameters: wilcox.test(p1, p2, correct=TRUE, exact=FALSE)
wilcox.test(p1, p2)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  p1 and p2
## W = 94, p-value = 0.2114
## alternative hypothesis: true location shift is not equal to 0

# Wilcoxon paired test
# Data (example):
p1<-c(24, 25, 21, 22, 23, 18, 17, 28, 24, 27, 21, 23)
p2<-c(20, 23, 21, 25, 18, 17, 18, 24, 20, 24, 23, 19)

# Test:
# parameters: wilcox.test(p2, p1, paired=TRUE, correct=FALSE, exact=FALSE, alternative="two.sided")
wilcox.test(p1, p2, paired=TRUE)

## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  p1 and p2
## V = 55.5, p-value = 0.04898
## alternative hypothesis: true location shift is not equal to 0

# Kruskal-Wallis test
# Data (example):
df<-data.frame(A = c(7,8,9,9,10,11), B = c(12,13,14,14,15,16))
df2 <- tidyr::gather(df)

# Test:
kruskal.test(value ~ key, data = df2)

## 
##  Kruskal-Wallis rank sum test
## 
## data:  value by key
## Kruskal-Wallis chi-squared = 8.3662, df = 1, p-value = 0.003823

import numpy as np
import pandas as pd
from scipy.stats import wilcoxon, kruskal, mannwhitneyu
from statsmodels.stats.proportion import proportions_ztest

# Runs test
# Example data
vHT = "HHTHTTTHHTHTTHHTHTHHTHTHHHTTHTHTHTTHHTTTTHHTHHTTHTHTTHTHHHT"

# Transforming into a vector of 0s and 1s
v01 = np.array([1 if char == 'H' else 0 for char in vHT])

# Calculations
def getRuns(l):
    import itertools
    # return len([sum(1 for _ in r) for _, r in itertools.groupby(l)])
    return sum(1 for _ in itertools.groupby(l))
r = getRuns(v01)
n = len(v01)
n1 = sum(v01)
n0 = n - n1
mu_r = (2 * n0 * n1 / n) + 1
sigma_r = np.sqrt((2 * n0 * n1 * (2 * n0 * n1 - n)) / (n ** 2 * (n - 1)))
z = (r-mu_r-np.sign(r-mu_r)/2)/sigma_r
p_value = 2 * (1 - norm.cdf(abs(z)))
print("z =", z, "p-value =", p_value)

## z = 1.8413274595803757 p-value = 0.0655735860394191

# Mann-Whitney test
# Data (example):
p1 = np.array([24, 25, 21, 22, 23, 18, 17, 28, 24, 27, 21, 23])
p2 = np.array([20, 23, 21, 25, 18, 17, 18, 24, 20, 24, 23, 19])

# Test
mannwhitneyu(p1, p2)

## MannwhitneyuResult(statistic=94.0, pvalue=0.21138945901258455)

# Wilcoxon paired test
# Data (example):
p1 = np.array([24, 25, 21, 22, 23, 18, 17, 28, 24, 27, 21, 23])
p2 = np.array([20, 23, 21, 25, 18, 17, 18, 24, 20, 24, 23, 19])

# Test
wilcoxon(p1, p2)

## WilcoxonResult(statistic=10.5, pvalue=0.044065400736826854)

# Kruskal-Wallis test
# Data (example):
df = pd.DataFrame({'A': [7, 8, 9, 9, 10, 11], 'B': [12, 13, 14, 14, 15, 16]})
df2 = df.melt()

# Test
kruskal(*[group["value"].values for name, group in df2.groupby("variable")])

## KruskalResult(statistic=8.366197183098597, pvalue=0.0038226470545864484)

20.6 Exercises

Exercise 20.1 A student group was given the task of creating a fictional sequence of 100 coin tosses (in Polish O/R, orzeł/reszka). A student, Julia, wrote down the following sequence:

OORORRROORORROOROROOROROOORRORORORROORRRROOROORRORORROROOORORROOORORORORROORRROORORROROOORORRROORORR

Using the runs test, determine whether the sequence prepared by Julia can be considered random.

Exercise 20.2 Use data from exercise 16.2 to conduct an appropriate nonparametric test.

Exercise 20.3 Use data from exercise 16.5 to conduct an appropriate nonparametric test.

Exercise 20.4 Use data from exercise 19.2 to conduct an appropriate nonparametric test.