6.6 Ordered probit model

The ordered probit model is used when there is a natural order in the categorical response variable. In this case, there is a latent variable \(y_i^* = \mathbf{x}_i^{\top}\boldsymbol{\beta} + \mu_i\), where \(\mathbf{x}_i\) is a \(K\)-dimensional vector of regressors, \(\mu_i \stackrel{i.i.d.}{\sim} N(0,1)\), such that \(y_i = l\) if and only if \(\alpha_{l-1} < y_i^* \leq \alpha_l\), for \(l \in \{1, 2, \dots, L\}\), where \(\alpha_0 = -\infty\), \(\alpha_1 = 0\), and \(\alpha_L = \infty\).²⁹ Then, the probability of observing \(y_i = l\) is given by:

\[ p(y_i = l) = \Phi(\alpha_l - \mathbf{x}_i^{\top}\boldsymbol{\beta}) - \Phi(\alpha_{l-1} - \mathbf{x}_i^{\top}\boldsymbol{\beta}), \]

and the likelihood function is:

\[ p(\boldsymbol{\beta}, \boldsymbol{\alpha} \mid \mathbf{y}, \mathbf{X}) = \prod_{i=1}^{N} p(y_i = l \mid \boldsymbol{\beta}, \boldsymbol{\alpha}, \mathbf{X}). \]

The priors in this model are independent, i.e., \(\pi(\boldsymbol{\beta}, \boldsymbol{\gamma}) = \pi(\boldsymbol{\beta}) \times \pi(\boldsymbol{\gamma})\), where \(\boldsymbol{\beta} \sim N(\boldsymbol{\beta}_0, \boldsymbol{B}_0)\) and \(\boldsymbol{\gamma} \sim N(\boldsymbol{\gamma}_0, \boldsymbol{\Gamma}_0)\), and \(\boldsymbol{\gamma} = \left[ \gamma_2 \ \gamma_3 \ \dots \ \gamma_{L-1} \right]^{\top}\), such that:

\[ \boldsymbol{\alpha} = \left[ \exp\left\{\gamma_2\right\} \ \sum_{l=2}^{3} \exp\left\{\gamma_l\right\} \ \dots \ \sum_{l=2}^{L-1} \exp\left\{\gamma_l\right\} \right]^{\top}. \]

This structure imposes the ordinal condition on the cut-offs.

This model does not have a standard conditional posterior distribution for \(\boldsymbol{\gamma}\) (\(\boldsymbol{\alpha}\)), but it does have a standard conditional distribution for \(\boldsymbol{\beta}\) once data augmentation is used. We can then use a Metropolis-within-Gibbs sampling algorithm. In particular, we use Gibbs sampling to draw \(\boldsymbol{\beta}\) and \(\boldsymbol{y}^*\), where:

\[ \boldsymbol{\beta} \mid \boldsymbol{y}^*, \boldsymbol{\alpha}, \boldsymbol{X} \sim N(\boldsymbol{\beta}_n, \boldsymbol{B}_n), \]

with \(\boldsymbol{B}_n = (\boldsymbol{B}_0^{-1} + \boldsymbol{X}^{\top}\boldsymbol{X})^{-1}\), \(\boldsymbol{\beta}_n = \boldsymbol{B}_n(\boldsymbol{B}_0^{-1}\boldsymbol{\beta}_0 + \boldsymbol{X}^{\top}\boldsymbol{y}^*)\), and:

\[ y_i^* \mid \boldsymbol{\beta}, \boldsymbol{\alpha}, \boldsymbol{y}, \boldsymbol{X} \sim TN_{(\alpha_{y_i-1}, \alpha_{y_i})}(\boldsymbol{x}_i^{\top}\boldsymbol{\beta}, 1). \]

We use a random-walk Metropolis–Hastings algorithm for \(\boldsymbol{\gamma}\) with a proposal distribution that is Gaussian with mean equal to the current value and covariance matrix \(s^2(\boldsymbol{\Gamma}_0^{-1} + \hat{\boldsymbol{\Sigma}}_{\gamma}^{-1})^{-1}\), where \(s > 0\) is a tuning parameter and \(\hat{\boldsymbol{\Sigma}}_{\gamma}\) is the sample covariance matrix associated with \(\gamma\) from the maximum likelihood estimation.

Example: Determinants of preventive health care visits

We used the file named 2HealthMed.csv in this application. In particular, the dependent variable is MedVisPrevOr, which is an ordered variable equal to 1 if the individual did not visit a physician for preventive reasons, 2 if the individual visited once in that year, and so on, until it is equal to 6 for visiting five or more times. The latter category represents 1.6% of the sample. Observe that the dependent variable has six categories.

In this example, the set of regressors is given by SHI, which is an indicator of being in the subsidized health care system (1 means being in the system), sex (Female), age (both linear and squared), socioeconomic conditions indicator (Est2 and Est3), with the lowest being the baseline category, self-perception of health status (Fair, Good, and Excellent), where Bad is the baseline, and education level (PriEd, HighEd, VocEd, UnivEd), with no education as the baseline category.

We ran this application with 50,000 MCMC iterations plus 10,000 as burn-in, and a thinning parameter equal to 5. This setting means 10,000 effective posterior draws. We set \(\boldsymbol{\beta}_0 = \boldsymbol{0}_{11}\), \(\boldsymbol{B}_0 = 1000\boldsymbol{I}_{11}\), \(\boldsymbol{\gamma}_0 = \boldsymbol{0}_4\), \(\boldsymbol{\Gamma}_0 = \boldsymbol{I}_4\), and the tuning parameter is 1.

We can run the ordered probit models in our GUI following the steps in the next Algorithm.

Algorithm: Ordered probit models

Select Univariate Models on the top panel
Select Ordered Probit model using the left radio button
Upload the dataset by first selecting whether there is a header in the file, and the kind of separator in the csv file of the dataset (comma, semicolon, or tab). Then, use the Browse button under the Choose File legend. You should see a preview of the dataset
Select MCMC iterations, burn-in, and thinning parameters using the Range sliders
Select dependent and independent variables using the Formula builder table
Click the Build formula button to generate the formula in R syntax. Remember that this formula must have -1 to omit the intercept in the specification.
Set the hyperparameters: mean vectors and covariance matrices. This step is not necessary as by default our GUI uses non-informative priors
Select the number of ordered alternatives
Set the hyperparameters: mean and covariance matrix of the cutoffs. This step is not necessary as by default our GUI uses non-informative prior
Select the tuning parameter for the Metropolis-Hastings algorithm
Click the Go! button
Analyze results
Download posterior chains and diagnostic plots using the Download Posterior Chains and Download Posterior Graphs buttons

The following R code shows how to perform inference in this model using the command rordprobitGibbs from the bayesm library, which is the command that our GUI uses.

rm(list = ls())
set.seed(010101)
Data <- read.csv("https://raw.githubusercontent.com/besmarter/BSTApp/refs/heads/master/DataApp/2HealthMed.csv", sep = ",", header = TRUE, quote = "")
attach(Data)

## The following objects are masked from mydata:
## 
##     Age, Age2, Bad, Est1, Est2, Est3, Excellent, Fair, Female,
##     FemaleAge, Good, HighEd, Hosp, id, MedVisPrev, MedVisPrevOr, NoEd,
##     PriEd, PTL, SHI, UnivEd, VocEd

## The following objects are masked from Data (pos = 4):
## 
##     Age, Age2

y <- MedVisPrevOr 
# MedVisPrevOr: Oredered variable for preventive visits to doctors in one year: 1 (none), 2 (once), ... 6 (five or more)
X <- cbind(SHI, Female, Age, Age2, Est2, Est3, Fair, Good, Excellent, PriEd, HighEd, VocEd, UnivEd)
k <- dim(X)[2]
L <- length(table(y))
# Hyperparameters
b0 <- rep(0, k); c0 <- 1000; B0 <- c0*diag(k)
gamma0 <- rep(0, L-2); Gamma0 <- diag(L-2)
# MCMC parameters
mcmc <- 60000+1; thin <- 5; tuningPar <- 1/(L-2)^0.5
DataApp <- list(y = y, X = X, k = L)
Prior <- list(betabar = b0, A = solve(B0), dstarbar = gamma0, Ad = Gamma0)
mcmcpar <- list(R = mcmc, keep = 5, s = tuningPar, nprint = 0)
PostBeta <- bayesm::rordprobitGibbs(Data = DataApp, Prior = Prior, Mcmc = mcmcpar)

##  
## Starting Gibbs Sampler for Ordered Probit Model
##    with  12975 observations
##  
## Table of y values
## y
##    1    2    3    4    5    6 
## 1935 2837 1241 2043 4711  208 
##  
## Prior Parms: 
## betabar
##  [1] 0 0 0 0 0 0 0 0 0 0 0 0 0
##  
## A
##        [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]  [,9] [,10] [,11] [,12]
##  [1,] 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
##  [2,] 0.000 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
##  [3,] 0.000 0.000 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
##  [4,] 0.000 0.000 0.000 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
##  [5,] 0.000 0.000 0.000 0.000 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.000
##  [6,] 0.000 0.000 0.000 0.000 0.000 0.001 0.000 0.000 0.000 0.000 0.000 0.000
##  [7,] 0.000 0.000 0.000 0.000 0.000 0.000 0.001 0.000 0.000 0.000 0.000 0.000
##  [8,] 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.001 0.000 0.000 0.000 0.000
##  [9,] 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.001 0.000 0.000 0.000
## [10,] 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.001 0.000 0.000
## [11,] 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.001 0.000
## [12,] 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.001
## [13,] 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
##       [,13]
##  [1,] 0.000
##  [2,] 0.000
##  [3,] 0.000
##  [4,] 0.000
##  [5,] 0.000
##  [6,] 0.000
##  [7,] 0.000
##  [8,] 0.000
##  [9,] 0.000
## [10,] 0.000
## [11,] 0.000
## [12,] 0.000
## [13,] 0.001
##  
## dstarbar
## [1] 0 0 0 0
##  
## Ad
##      [,1] [,2] [,3] [,4]
## [1,]    1    0    0    0
## [2,]    0    1    0    0
## [3,]    0    0    1    0
## [4,]    0    0    0    1
##  
## MCMC parms: 
## R=  60001  keep=  5  nprint=  0 s=  0.5
##

BetasPost <- coda::mcmc(PostBeta[["betadraw"]])
colnames(BetasPost) <- c("SHI", "Female", "Age", "Age2", "Est2", "Est3", "Fair", "Good", "Excellent", "PriEd", "HighEd", "VocEd", "UnivEd")
summary(BetasPost)

## 
## Iterations = 1:12000
## Thinning interval = 1 
## Number of chains = 1 
## Sample size per chain = 12000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##                 Mean        SD  Naive SE Time-series SE
## SHI        0.0654824 2.281e-02 2.082e-04      3.357e-04
## Female    -0.0374788 1.908e-02 1.742e-04      1.742e-04
## Age        0.0190336 1.869e-03 1.706e-05      4.576e-05
## Age2      -0.0002328 2.438e-05 2.225e-07      6.690e-07
## Est2       0.0949445 2.226e-02 2.032e-04      4.659e-04
## Est3      -0.1383965 3.411e-02 3.114e-04      3.459e-04
## Fair       0.6451828 5.375e-02 4.907e-04      3.924e-03
## Good       0.7343932 4.955e-02 4.523e-04      4.491e-03
## Excellent  0.9826531 6.393e-02 5.836e-04      5.261e-03
## PriEd      0.0309418 2.376e-02 2.169e-04      2.221e-04
## HighEd    -0.1805753 2.910e-02 2.656e-04      3.456e-04
## VocEd      0.1395760 9.640e-02 8.800e-04      9.291e-04
## UnivEd    -0.2218120 1.189e-01 1.086e-03      1.086e-03
## 
## 2. Quantiles for each variable:
## 
##                 2.5%        25%        50%        75%      97.5%
## SHI        0.0209045  0.0499525  0.0654041  0.0808572  0.1102130
## Female    -0.0746364 -0.0504288 -0.0377778 -0.0245643  0.0002350
## Age        0.0155088  0.0178114  0.0190228  0.0202305  0.0226864
## Age2      -0.0002804 -0.0002479 -0.0002328 -0.0002168 -0.0001864
## Est2       0.0514963  0.0800442  0.0948246  0.1096800  0.1393326
## Est3      -0.2055927 -0.1614479 -0.1381575 -0.1156348 -0.0717986
## Fair       0.5579955  0.6129584  0.6414878  0.6726800  0.7439563
## Good       0.6669005  0.7080863  0.7303217  0.7540621  0.8106430
## Excellent  0.8891998  0.9477048  0.9783661  1.0102615  1.0846084
## PriEd     -0.0158405  0.0149388  0.0310167  0.0471892  0.0773202
## HighEd    -0.2378212 -0.2003574 -0.1802174 -0.1607329 -0.1243538
## VocEd     -0.0491123  0.0747424  0.1381100  0.2041479  0.3333107
## UnivEd    -0.4538119 -0.3023902 -0.2219313 -0.1414801  0.0086323

# Convergence diagnostics
coda::geweke.diag(BetasPost)

## 
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5 
## 
##       SHI    Female       Age      Age2      Est2      Est3      Fair      Good 
##    1.9824    0.1488    1.6564   -1.5988    0.9782   -1.9159    1.2891    1.2890 
## Excellent     PriEd    HighEd     VocEd    UnivEd 
##    1.3675    2.2458   -1.3570    1.0199   -0.6709

coda::raftery.diag(BetasPost,q=0.5,r=0.05,s = 0.95)

## 
## Quantile (q) = 0.5
## Accuracy (r) = +/- 0.05
## Probability (s) = 0.95 
##                                                  
##            Burn-in  Total Lower bound  Dependence
##            (M)      (N)   (Nmin)       factor (I)
##  SHI       2        393   385          1.020     
##  Female    2        382   385          0.992     
##  Age       2        401   385          1.040     
##  Age2      2        399   385          1.040     
##  Est2      2        409   385          1.060     
##  Est3      1        385   385          1.000     
##  Fair      6        1227  385          3.190     
##  Good      12       1792  385          4.650     
##  Excellent 6        1233  385          3.200     
##  PriEd     2        392   385          1.020     
##  HighEd    2        392   385          1.020     
##  VocEd     2        390   385          1.010     
##  UnivEd    2        393   385          1.020

coda::heidel.diag(BetasPost)

##                                         
##           Stationarity start     p-value
##           test         iteration        
## SHI       passed       1201      0.8026 
## Female    passed          1      0.5041 
## Age       passed       1201      0.0812 
## Age2      passed       1201      0.0928 
## Est2      passed       1201      0.1970 
## Est3      passed       1201      0.4047 
## Fair      failed         NA      0.0360 
## Good      failed         NA      0.0156 
## Excellent failed         NA      0.0403 
## PriEd     passed          1      0.1479 
## HighEd    passed       1201      0.0870 
## VocEd     passed          1      0.5223 
## UnivEd    passed          1      0.6614 
##                                        
##           Halfwidth Mean      Halfwidth
##           test                         
## SHI       passed     0.065098 4.19e-04 
## Female    passed    -0.037479 3.41e-04 
## Age       passed     0.018970 3.38e-05 
## Age2      passed    -0.000232 4.40e-07 
## Est2      passed     0.094472 4.10e-04 
## Est3      passed    -0.138067 6.42e-04 
## Fair      <NA>             NA       NA 
## Good      <NA>             NA       NA 
## Excellent <NA>             NA       NA 
## PriEd     passed     0.030942 4.35e-04 
## HighEd    passed    -0.180255 5.47e-04 
## VocEd     passed     0.139576 1.82e-03 
## UnivEd    passed    -0.221812 2.13e-03

# Cut offs
Cutoffs <- PostBeta[["cutdraw"]]
summary(Cutoffs)

## Summary of Posterior Marginal Distributions 
## Moments 
##   mean std dev   num se rel eff sam size
## 1 0.00   0.000 -1.0e+04   -9999    -9999
## 2 0.71   0.013  1.3e-03     108       99
## 3 0.96   0.014  1.3e-03      96      112
## 4 1.37   0.015  1.3e-03      85      127
## 5 3.20   0.030  1.8e-03      38      277
## 
## Quantiles 
##   2.5%   5%  50%  95% 97.5%
## 1 0.00 0.00 0.00 0.00  0.00
## 2 0.68 0.69 0.71 0.73  0.73
## 3 0.93 0.94 0.96 0.98  0.98
## 4 1.33 1.34 1.37 1.39  1.39
## 5 3.14 3.15 3.20 3.25  3.26
##    based on 10800 valid draws (burn-in=1200)

coda::geweke.diag(Cutoffs)

## 
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5 
## 
##   var1   var2   var3   var4   var5 
##    NaN 0.5305 1.2222 1.2512 1.6850

coda::heidel.diag(Cutoffs)

##                                    
##      Stationarity start     p-value
##      test         iteration        
## [,1] failed         NA          NA 
## [,2] passed       1201      0.2060 
## [,3] passed       1201      0.1192 
## [,4] passed       1201      0.1004 
## [,5] passed       1201      0.0681 
##                              
##      Halfwidth Mean Halfwidth
##      test                    
## [,1] <NA>        NA      NA  
## [,2] passed    0.71 0.00399  
## [,3] passed    0.96 0.00349  
## [,4] passed    1.37 0.00300  
## [,5] passed    3.20 0.00295

coda::raftery.diag(Cutoffs[,-1],q=0.5,r=0.05,s = 0.95)

## 
## Quantile (q) = 0.5
## Accuracy (r) = +/- 0.05
## Probability (s) = 0.95 
##                                        
##  Burn-in  Total Lower bound  Dependence
##  (M)      (N)   (Nmin)       factor (I)
##  390      49320 385          128.0     
##  340      43214 385          112.0     
##  224      28504 385           74.0     
##  64       7432  385           19.3

The results suggest that older individuals (at a decreasing rate) in the subsidized health program, characterized by being in the second socioeconomic status, with an increasing self-perception of good health, and not having high school as their highest education degree, have a higher probability of visiting a physician for preventive health purposes. Convergence diagnostics look well, except for the self-health perception draws.

We also obtained the posterior estimates of the cutoffs in the ordered probit model. These estimates are necessary to calculate the probability that an individual is in a specific category of physician visits. Due to identification restrictions, the first cutoff is set equal to 0. This is why we observe NaN values in J. Geweke (1992) and Heidelberger and Welch (1983) tests, and only four values in the A. E. Raftery and Lewis (1992) test, which correspond to the remaining free cutoffs. It seems that these cutoff estimates have some convergence issues when using the A. E. Raftery and Lewis (1992) test as a diagnostic tool. Furthermore, their dependence factors are also very high.

References

Geweke, J. 1992. “Bayesian Statistics.” In. Clarendon Press, Oxford, UK.

Heidelberger, P., and P. D. Welch. 1983. “Simulation Run Length Control in the Presence of an Initial Transient.” Operations Research 31 (6): 1109–44.

Raftery, A. E., and S. M. Lewis. 1992. “One Long Run with Diagnostics: Implementation Strategies for Markov Chain Monte Carlo.” Statistical Science 7: 493–97.

Identification issues necessitate setting the variance in this model equal to 1 and \(\alpha_1 = 0\). Observe that multiplying \(y_i^*\) by a positive constant or adding a constant to all of the cut-offs and subtracting the same constant from the intercept does not affect \(y_i\).↩︎