6.9 Quantile regression

In quantile regression, the location parameters vary according to the quantile of the dependent variable. Let \(q_{\tau}(\boldsymbol{x}_i) = \boldsymbol{x}_i^{\top} \boldsymbol{\beta}_{\tau}\) denote the \(\tau\)-th quantile regression function of \(y_i\) given \(\boldsymbol{x}_i\), where \(\boldsymbol{x}_i\) is a \(K\)-dimensional vector of regressors, and \(0 < \tau < 1\). Specifically, we have the model \(y_i = \boldsymbol{x}_i^{\top} \boldsymbol{\beta}_{\tau} + \mu_i\), with the condition \(\int_{-\infty}^{0} f_{\tau}(\mu_i) \, d\mu_i = \tau\), meaning that the \(\tau\)-th quantile of \(\mu_i\) is 0.

In particular, Kozumi and Kobayashi (2011) propose the asymmetric Laplace distribution for \(f_{\tau}(\mu_i)\), given by

\[ f_{\tau}(\mu_i) = \tau(1 - \tau) \exp\left\{- \mu_i(\tau - \mathbb{1}({\mu_i < 0})) \right\}, \]

where \(\mu_i(\tau - \mathbb{1}({\mu_i < 0}))\) is the check (loss) function. These authors also propose a location-scale mixture of normals representation, given by

\[ \mu_i = \theta e_i + \psi \sqrt{e_i} z_i, \]

where \(\theta = \frac{1 - 2\tau}{\tau(1 - \tau)}\), \(\psi^2 = \frac{2}{\tau(1 - \tau)}\), \(e_i \sim E(1)\), and \(z_i \sim N(0,1)\), with \(e_i \perp z_i\).³¹ As a result of this representation and the fact that the sample is i.i.d., the likelihood function is

\[ p(\boldsymbol{y} \mid \boldsymbol{\beta}_{\tau}, \boldsymbol{e}, \boldsymbol{X}) \propto \left( \prod_{i=1}^{N} e_i^{-1/2} \right) \exp\left\{- \sum_{i=1}^{N} \frac{(y_i - \boldsymbol{x}_i^{\top} \boldsymbol{\beta}_{\tau} - \theta e_i)^2}{2 \psi^2 e_i} \right\}. \]

Assuming a normal prior for \(\boldsymbol{\beta}_{\tau}\), i.e., \(\boldsymbol{\beta}_{\tau} \sim N(\boldsymbol{\beta}_{\tau 0}, \boldsymbol{B}_{\tau 0})\), and using data augmentation for \(\boldsymbol{e}\), we can implement a Gibbs sampling algorithm for this model. The posterior distributions are as follows:

\[\begin{equation*} \boldsymbol{\beta}_{\tau} \mid \boldsymbol{e}, \boldsymbol{y}, \boldsymbol{X} \sim N(\boldsymbol{\beta}_{n\tau}, \boldsymbol{B}_{n\tau}), \end{equation*}\]

\[\begin{equation*} e_i \mid \boldsymbol{\beta}_{\tau}, \boldsymbol{y}, \boldsymbol{X} \sim \text{GIG}\left( \frac{1}{2}, \alpha_{ni}, \delta_{ni} \right), \end{equation*}\]

where

\[\begin{align*} \boldsymbol{B}_{n\tau} &= \left( \boldsymbol{B}_{\tau 0}^{-1} + \sum_{i=1}^{N} \frac{\boldsymbol{x}_i \boldsymbol{x}_i^{\top}}{\psi^2 e_i} \right)^{-1}, \\ \boldsymbol{\beta}_{n\tau} &= \boldsymbol{B}_{n\tau} \left( \boldsymbol{B}_{\tau 0}^{-1} \boldsymbol{\beta}_{\tau 0} + \sum_{i=1}^{N} \frac{\boldsymbol{x}_i (y_i - \theta e_i)}{\psi^2 e_i} \right), \\ \alpha_{ni} &= \frac{(y_i - \boldsymbol{x}_i^{\top} \boldsymbol{\beta}_{\tau})^2}{\psi^2}, \quad \delta_{ni} = 2 + \frac{\theta^2}{\psi^2}. \end{align*}\]

Example: The market value of soccer players in Europe continues

We continue the example of the market value of soccer players from Section 6.1. Now, we want to examine whether the marginal effect of having been on the national team varies with the quantile of the market value of top soccer players in Europe. Thus, we use the same regressors as in the previous example, but analyze the effects at the 0.5-th and 0.9-th quantiles of NatTeam.

The following Algorithm shows how to estimate quantile regression models in our GUI. Our GUI uses the command MCMCquantreg from the package MCMCpack. The following code demonstrates how to perform this analysis using the package.

The results show that at the median market value (0.5-th quantile), the 95% credible interval for the coefficient associated with national team is (0.34, 1.02), with a posterior mean of 0.69. At the 0.9-th quantile, these values are (0.44, 1.59) and 1.03, respectively. It appears that being on the national team increases the market value of more expensive players more significantly on average, although there is some overlap in the credible intervals.

Algorithm: Quantile regression

Select Univariate Models on the top panel
Select Quantile model using the left radio button
Upload the dataset by first selecting whether there is a header in the file, and the kind of separator in the csv file of the dataset (comma, semicolon, or tab). Then, use the Browse button under the Choose File legend. You should see a preview of the dataset
Select MCMC iterations, burn-in, and thinning parameters using the Range sliders
Select dependent and independent variables using the Formula builder table
Click the Build formula button to generate the formula in R syntax. You can modify the formula in the Main equation box using valid arguments of the formula command structure in R
Set the quantile to be analyzed, by default it is 0.5
Set the hyperparameters: mean vector and covariance matrix. This step is not necessary as by default our GUI uses non-informative priors
Click the Go! button
Analyze results
Download posterior chains and diagnostic plots using the Download Posterior Chains and Download Posterior Graphs buttons

rm(list = ls()); set.seed(010101)
Data <- read.csv("https://raw.githubusercontent.com/besmarter/BSTApp/refs/heads/master/DataApp/1ValueFootballPlayers.csv", sep = ",", header = TRUE, quote = "")
attach(Data)

## The following objects are masked from Data (pos = 3):
## 
##     Age, Age2, Assists, Exp, Exp2, Goals, Goals2, NatTeam, Perf, Perf2,
##     Player, Value, ValueCens

## The following objects are masked from Data (pos = 4):
## 
##     Age, Age2

## The following objects are masked from mydata:
## 
##     Age, Age2

## The following objects are masked from Data (pos = 6):
## 
##     Age, Age2, Assists, Exp, Exp2, Goals, Goals2, NatTeam, Perf, Perf2,
##     Player, Value, ValueCens

y <- log(ValueCens) 
X <- cbind(1, Perf, Age, Age2, NatTeam, Goals, Exp, Exp2)
k <- dim(X)[2]; N <- dim(X)[1]
# Hyperparameters
b0 <- rep(0, k); c0 <- 1000; B0 <- c0*diag(k); B0i <- solve(B0)
# MCMC parameters
mcmc <- 50000; burnin <- 10000
tot <- mcmc + burnin; thin <- 1
# Quantile
q <- 0.5
posterior05  <- MCMCpack::MCMCquantreg(y~X-1, tau = q, b0=b0, B0 = B0i, burnin = burnin, mcmc = mcmc, thin = thin, below = 13.82, above = Inf)
summary(coda::mcmc(posterior05))

## 
## Iterations = 1:50000
## Thinning interval = 1 
## Number of chains = 1 
## Sample size per chain = 50000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##               Mean       SD  Naive SE Time-series SE
## X         7.380881 2.922415 1.307e-02      2.305e-02
## XPerf     0.029380 0.005972 2.671e-05      5.269e-05
## XAge      0.550080 0.242058 1.083e-03      1.911e-03
## XAge2    -0.012009 0.004607 2.061e-05      3.665e-05
## XNatTeam  0.681238 0.170806 7.639e-04      1.564e-03
## XGoals    0.010642 0.002415 1.080e-05      1.958e-05
## XExp      0.091802 0.085777 3.836e-04      6.821e-04
## XExp2    -0.002962 0.003874 1.733e-05      2.930e-05
## 
## 2. Quantiles for each variable:
## 
##               2.5%       25%       50%        75%     97.5%
## X         1.721153  5.421451  7.355579  9.3156087 13.189612
## XPerf     0.017526  0.025434  0.029410  0.0333849  0.040987
## XAge      0.070621  0.389228  0.552940  0.7121539  1.020032
## XAge2    -0.020951 -0.015103 -0.012073 -0.0089386 -0.002859
## XNatTeam  0.341558  0.568023  0.682516  0.7949333  1.017583
## XGoals    0.005849  0.009081  0.010586  0.0122120  0.015472
## XExp     -0.068241  0.032809  0.088512  0.1472835  0.269448
## XExp2    -0.010951 -0.005446 -0.002856 -0.0003978  0.004471

q <- 0.9
posterior09  <- MCMCpack::MCMCquantreg(y~X-1, tau = q, b0=b0, B0 = B0i, burnin = burnin, mcmc = mcmc, thin = thin, below = 13.82, above = Inf)
summary(coda::mcmc(posterior09))

## 
## Iterations = 1:50000
## Thinning interval = 1 
## Number of chains = 1 
## Sample size per chain = 50000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##               Mean       SD  Naive SE Time-series SE
## X         8.871556 5.825738 2.605e-02      6.605e-02
## XPerf     0.019617 0.010110 4.521e-05      1.127e-04
## XAge      0.531550 0.473922 2.119e-03      5.320e-03
## XAge2    -0.012324 0.008689 3.886e-05      9.523e-05
## XNatTeam  1.039941 0.295468 1.321e-03      3.453e-03
## XGoals    0.008703 0.004295 1.921e-05      4.799e-05
## XExp      0.136201 0.176328 7.886e-04      2.064e-03
## XExp2    -0.002835 0.007613 3.405e-05      8.450e-05
## 
## 2. Quantiles for each variable:
## 
##                2.5%       25%       50%       75%     97.5%
## X        -2.4546594  4.890189  8.838894 12.807759 20.296222
## XPerf    -0.0001067  0.012859  0.019580  0.026298  0.039655
## XAge     -0.3889773  0.210175  0.531298  0.850672  1.462811
## XAge2    -0.0295151 -0.018080 -0.012278 -0.006447  0.004438
## XNatTeam  0.4437112  0.845176  1.045862  1.241973  1.600731
## XGoals    0.0018928  0.005572  0.008122  0.011280  0.018537
## XExp     -0.2241036  0.019995  0.142557  0.258915  0.461877
## XExp2    -0.0163570 -0.008121 -0.003296  0.001949  0.013474

References

Kozumi, Hideo, and Genya Kobayashi. 2011. “Gibbs Sampling Methods for Bayesian Quantile Regression.” Journal of Statistical Computation and Simulation 81 (11): 1565--1578.

\(E\) denotes an exponential density. ↩︎