7.4 Multivariate probit model

In the multivariate probit model Edwards and Allenby (2003), the response variable \(y_{il} = \{0, 1\}\) indicates that individual \(i\) makes binary choices among \(L\) mutually exclusive alternatives, where \(l = 1, 2, \dots, L\) and \(i = 1, 2, \dots, N\). Specifically,

\[ y_{il} = \begin{cases} 0, & \quad y_{il}^* \leq 0 \\ 1, & \quad y_{il}^* > 0 \end{cases} \]

where \(\boldsymbol{y}_i^* = \boldsymbol{X}_i \boldsymbol{\beta} + \boldsymbol{\mu}_i \sim \text{i.i.d.} \, N(\boldsymbol{0}, \boldsymbol{\Sigma})\). Here, \(\boldsymbol{y}_i^*\) is an unobserved latent \(L\)-dimensional vector, \(\boldsymbol{X}_i = \boldsymbol{x}_i^\top \otimes \mathbf{I}_L\) is an \(L \times K\) design matrix of regressors, with \(K = L \times k\), where \(k\) is the number of regressors (i.e., the length of \(\boldsymbol{x}_i\)). In addition, \(\boldsymbol{\beta} = \left[\boldsymbol{\beta}_1^\top \ \boldsymbol{\beta}_2^\top \dots \boldsymbol{\beta}_k^\top\right]^\top\), where \(\boldsymbol{\beta}_j\) forms an \(L\)-dimensional vector of coefficients for \(j = 1, 2, \dots, k\).

The likelihood function for this model is given by

\[ p(\boldsymbol{\beta}, \boldsymbol{\Sigma} \mid \boldsymbol{y}, \boldsymbol{X}) = \prod_{i=1}^N \prod_{l=1}^L p_{il}^{y_{il}}, \]

where \(p_{il} = p(y_{il}^* \geq 0)\).

Observe that \(p({y}_{il}^*\geq 0)=p({\lambda}_{ll}{y}_{il}^*\geq 0)\), \(\lambda_{ll}>0\). This generates identification issues because only the correlation matrix can be identified, similar to the univariate probit model where the variance of the model is fixed to 1. We follow the post-processing strategy proposed by Edwards and Allenby (2003) to obtain identified parameters, that is, \(\tilde{\boldsymbol{\beta}}=\text{vec}\left\{\boldsymbol{\Lambda}\mathbf{B}\right\}\) and the correlation matrix \(\boldsymbol{R}=\boldsymbol{\Lambda}\boldsymbol{\Sigma}\boldsymbol{\Lambda}\), where \(\boldsymbol{\Lambda}=\text{diag}\left\{\sigma_{ll}\right\}^{-1/2}\) and \(\mathbf{B}=\left[\boldsymbol{\beta}_1 \ \boldsymbol{\beta}_2 \dots \boldsymbol{\beta}_k\right]\).³⁹

We assume independent priors: \(\boldsymbol{\beta} \sim N(\boldsymbol{\beta}_0, \boldsymbol{B}_0)\) and \(\boldsymbol{\Sigma}^{-1} \sim W(\alpha_0, \boldsymbol{\Psi}_0)\). We can apply Gibbs sampling to this model, as it is a standard Bayesian linear regression model when data augmentation in \(\boldsymbol{y}^*\) is used.

The posterior conditional distributions are \[\begin{equation} \boldsymbol{\beta}\mid \boldsymbol{\Sigma},\boldsymbol{w} \sim N(\boldsymbol{\beta}_n,\boldsymbol{B}_n), \end{equation}\] \[\begin{equation} \boldsymbol{\Sigma}^{-1} \mid \boldsymbol{\beta},\boldsymbol{w} \sim W(\alpha_n,\boldsymbol{\Psi}_n), \end{equation}\] \[\begin{equation} y_{il}^* \mid \boldsymbol{y}_{i,-l}^*,\boldsymbol{\beta},\boldsymbol{\Sigma}^{-1},\boldsymbol{y_i} \sim TN_{I_{il}}(m_{il},\tau_{ll}^2) \end{equation}\]

where \(\boldsymbol{B}_n=(\boldsymbol{B}_0^{-1}+\boldsymbol{X}^{*\top}\boldsymbol{X}^*)^{-1}\), \(\boldsymbol{\beta}_n=\boldsymbol{B}_n(\boldsymbol{B}_0^{-1}\boldsymbol{\beta}_0+\boldsymbol{X}^{*\top}\boldsymbol{y}^{**})\), \(\boldsymbol{\Sigma}^{-1}=\boldsymbol{C}^{\top}\boldsymbol{C}\), \(\boldsymbol{X}_i^{*}=\boldsymbol{C}^{\top}\boldsymbol{X}_i\), \(\boldsymbol{y}_i^{**}=\boldsymbol{C}^{\top}\boldsymbol{y}_i^*\), \(\alpha_n=\alpha_0+N\), \(\boldsymbol{\Psi}_n=(\boldsymbol{\Psi}_0+\sum_{i=1}^N (\boldsymbol{y}_i^*-\boldsymbol{X}_i\boldsymbol{\beta})(\boldsymbol{y}_i^*-\boldsymbol{X}_i\boldsymbol{\beta})^{\top})^{-1}\),

\[ m_{il}=\boldsymbol{x}_{il}^{\top}\boldsymbol{\beta}+\boldsymbol{f}_l^{\top}(\boldsymbol{y}_{i,-l}^*-\boldsymbol{X}_{i,-l}\boldsymbol{\beta}), \]

where \(\boldsymbol{y}_{i,-l}^*\) is an \(L-1\) dimensional vector of all components of \(\boldsymbol{y}_i^*\) excluding \(y_{il}^*\), \(\boldsymbol{x}_{il}^{\top}\) is the \(l\)-th row of \(\boldsymbol{X}_i\), \(\boldsymbol{X}_{i,-l}\) is \(\boldsymbol{X}_{i}\) after deleting the \(l\)-th row,

\[ \boldsymbol{f}_l^{\top}=\boldsymbol{\omega}_{l,-l}^{\top}\boldsymbol{\Sigma}_{-l,-l}^{-1}, \]

where \(\boldsymbol{\omega}_{l,-l}^{\top}\) and \(\boldsymbol{\Sigma}_{-l,-l}\) are the \(l\)-th row of \(\boldsymbol{\Sigma}\) extracting the \(l\)-th element, and the sub-matrix of \(\boldsymbol{\Sigma}\) extracting the \(l,l\) element, and

\[ \tau_{ll}^2=\sigma_{l,l}-\boldsymbol{\omega}_{l,-l}^{\top}\boldsymbol{\Sigma}_{-l,-l}^{-1}\boldsymbol{\omega}_{-l,l}, \]

and

\[ \boldsymbol{X}^*= \begin{bmatrix} \boldsymbol{X}_1^*\\ \boldsymbol{X}_2^*\\ \vdots\\ \boldsymbol{X}_N^* \end{bmatrix}, \quad I_{il}= \begin{Bmatrix} y_{il}^*> 0, & y_{il}=1\\ y_{il}^*\leq 0 , & y_{il}=0 \end{Bmatrix}, \quad \boldsymbol{\Sigma}= \begin{bmatrix} \boldsymbol{\omega}_1^{\top} \\ \boldsymbol{\omega}_2^{\top} \\ \vdots \\ \boldsymbol{\omega}_{L}^{\top} \end{bmatrix}. \]

The setting in our GUI has the same regressors in each binary decision. However, we can see that the multivariate probit model is similar to a SUR model in latent variables. We ask in Exercise 9 to implement a Gibbs sampling algorithm for a multivariate probit model with different regressors in each equation.

Example: Self selection in hospitalization due to a subsidized health care program

We use the dataset 7HealthMed.csv, where the dependent variable is \(y = \left[\text{Hosp} \ \text{SHI}\right]^{\top}\), with \(\text{Hosp} = 1\) if an individual was hospitalized in the year prior to the survey (0 otherwise), and \(\text{SHI} = 1\) if the individual had subsidized health insurance (0 otherwise).

Recall that our application of binary response models aimed to uncover the determinants of hospitalization in Medellín (Colombia), where one of the regressors was a binary indicator of participation in a subsidized health care program (Section 6.3). We can use a bivariate probit model if we suspect there is dependence between the decisions regarding these two variables. A priori, we would expect that being in a subsidized health care program increases the probability of hospitalization ceteris paribus, due to reduced costs for the patient. However, if an individual expects to be hospitalized in the future, and the factors influencing this decision are unobserved by the modeler, a feedback effect may exist from hospitalization to enrollment in the subsidized health care program.

We considered seven regressors: a constant, gender (female), age, self-perception of health status (with categories fair, good, and excellent, using bad as the reference category), and the proportion of the individual’s age spent living in their neighborhood. The last variable attempts to account for social capital, which can affect enrollment in the subsidized health insurance program, as the target population is identified by the local government (Ramírez-Hassan and Guerra-Urzola 2021). The dataset includes 12,975 individuals who can “choose” two options: hospitalization and enrollment in the subsidized health insurance regime.

The following Algorithm shows how to run a multivariate probit model using our GUI.

Algorithm: Multivariate Probit Model

Select Multivariate Models on the top panel
Select Multivariate Probit model using the left radio button
Upload the dataset, selecting first if there is a header in the file, and the kind of separator in the csv file of the dataset (comma, semicolon, or tab). Then, use the Browse button under the Choose File legend
Select MCMC iterations, burn-in, and thinning parameters using the Range sliders
Write down the number of cross-sectional units in the Number of individuals: n box
Write down the number of exogenous variables in the Number of exogenous variables: k box
Write down the number of choices in the Number of choices: l box
Set the hyperparameters: mean vectors, covariance matrix, degrees of freedom, and the scale matrix. This step is not necessary as by default our GUI uses non-informative priors
Click the Go! button
Analyze results
Download posterior chains and diagnostic plots using the Download Posterior Chains and Download Posterior Graphs buttons

We set 20,000 MCMC iterations with a thinning parameter equal to 5. The hyperparameters are \(\boldsymbol{\beta}_0 = \boldsymbol{0}_{14}\), \(\boldsymbol{B}_0 = 100\boldsymbol{I}_{14}\), \(\alpha_0 = 4\), and \(\boldsymbol{\Psi}_0 = 4\boldsymbol{I}_2\).⁴⁰

rm(list = ls()); set.seed(010101)
Data <- read.csv("https://raw.githubusercontent.com/besmarter/BSTApp/refs/heads/master/DataApp/7HealthMed.csv", sep = ",", header = TRUE, quote = "")
attach(Data); str(Data)

## The following object is masked from DataUtEst:
## 
##     id

## The following object is masked from Data (pos = 6):
## 
##     Age

## The following object is masked from Data (pos = 7):
## 
##     Age

## The following objects are masked from Data (pos = 8):
## 
##     Age, Excellent, Fair, Female, Good, id, PTL

## The following objects are masked from mydata:
## 
##     Age, Excellent, Fair, Female, Good, id, PTL

## The following object is masked from Data (pos = 10):
## 
##     Age

## 'data.frame':    25950 obs. of  9 variables:
##  $ id       : int  1 1 2 2 3 3 4 4 5 5 ...
##  $ y        : int  0 1 0 1 0 1 0 1 0 0 ...
##  $ Constant : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Female   : int  0 0 1 1 1 1 1 1 0 0 ...
##  $ Age      : int  7 7 39 39 23 23 15 15 8 8 ...
##  $ Fair     : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Good     : int  1 1 1 1 1 1 1 1 0 0 ...
##  $ Excellent: int  0 0 0 0 0 0 0 0 1 1 ...
##  $ PTL      : num  0.43 0.43 0 0 0 0 0 0 0 0 ...

p <- 2; nd <- 7; N <- length(y)/p; y <- y
Xd <- as.matrix(Data[seq(1, p*N, 2),3:9])
XcreateMP<-function(p,nxs,nind,Data){
    pandterm = function(message) {
        stop(message, call. = FALSE)
    }
    if (missing(nxs)) 
    pandterm("requires number of regressors: include intercept if required")
    if (missing(nind)) 
    pandterm("requires number of units (individuals)")
    if (missing(Data)) 
    pandterm("requires dataset")
    if (nrow(Data)!=nind*2)
    pandterm("check dataset! number of units times number alternatives should be equal to dataset rows")
    XXDat<-array(0,c(p,1+nxs,nind))
    XX<-array(0,c(p,nxs*p,nind))
    YY<-array(0,c(p,1,nind))
    is<- seq(p,nind*p,p)
    cis<- seq(nxs,nxs*p+1,nxs)
    for(i in is){
        j<-which(i==is)
        XXDat[,,j]<-as.matrix(Data[c((i-(p-1)):i),-1])
        YY[,,j]<-XXDat[,1,j]
        for(l in 1:p){
            XX[l,((cis[l]-(nxs-1)):cis[l]),j]<-XXDat[l,-1,j]
        }
    }
    return(list(y=YY,X=XX))
}
Dat <- XcreateMP(p = p, nxs = nd, nind = N, Data = Data)
y<-NULL; X<-NULL
for(i in 1:dim(Dat$y)[3]){
    y<-c(y,Dat$y[,,i])
    X<-rbind(X,Dat$X[,,i])
}
DataMP = list(p=p, y=y, X=X)
# Hyperparameters
k <- dim(X)[2]; b0 <- rep(0, k); c0 <- 1000
B0 <- c0*diag(k); B0i <- solve(B0)
a0 <- p - 1 + 3; Psi0 <- a0*diag(p)
Prior <- list(betabar = b0, A = B0i, nu = a0, V = Psi0)
# MCMC parameters
mcmc <- 20000; thin <- 5; 
Mcmc <- list(R = mcmc, keep = thin, nprint = 0)
Results <- bayesm::rmvpGibbs(Data = DataMP, Mcmc = Mcmc, Prior = Prior)

## Table of y values
## y
##     0     1 
## 15653 10297 
##  
## Starting Gibbs Sampler for MVP
##    12975  obs of  2  binary indicators;  14  indep vars (including intercepts)
##  
## Prior Parms:
## betabar
##  [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## A
##        [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]  [,9] [,10] [,11] [,12]
##  [1,] 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
##  [2,] 0.000 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
##  [3,] 0.000 0.000 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
##  [4,] 0.000 0.000 0.000 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
##  [5,] 0.000 0.000 0.000 0.000 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.000
##  [6,] 0.000 0.000 0.000 0.000 0.000 0.001 0.000 0.000 0.000 0.000 0.000 0.000
##  [7,] 0.000 0.000 0.000 0.000 0.000 0.000 0.001 0.000 0.000 0.000 0.000 0.000
##  [8,] 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.001 0.000 0.000 0.000 0.000
##  [9,] 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.001 0.000 0.000 0.000
## [10,] 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.001 0.000 0.000
## [11,] 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.001 0.000
## [12,] 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.001
## [13,] 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
## [14,] 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
##       [,13] [,14]
##  [1,] 0.000 0.000
##  [2,] 0.000 0.000
##  [3,] 0.000 0.000
##  [4,] 0.000 0.000
##  [5,] 0.000 0.000
##  [6,] 0.000 0.000
##  [7,] 0.000 0.000
##  [8,] 0.000 0.000
##  [9,] 0.000 0.000
## [10,] 0.000 0.000
## [11,] 0.000 0.000
## [12,] 0.000 0.000
## [13,] 0.001 0.000
## [14,] 0.000 0.001
## nu
## [1] 4
## V
##      [,1] [,2]
## [1,]    4    0
## [2,]    0    4
##  
## MCMC Parms:
##    20000  reps; keeping every  5 th draw  nprint=  0
## initial beta=  0 0 0 0 0 0 0 0 0 0 0 0 0 0
## initial sigma= 
##      [,1] [,2]
## [1,]    1    0
## [2,]    0    1
##

betatilde1 <- Results$betadraw[,1:7] / sqrt(Results$sigmadraw[,1])
summary(coda::mcmc(betatilde1))

## 
## Iterations = 1:4000
## Thinning interval = 1 
## Number of chains = 1 
## Sample size per chain = 4000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##           Mean      SD  Naive SE Time-series SE
## [1,] -0.973678 0.12756 0.0020170      2.414e-03
## [2,]  0.122745 0.04872 0.0007704      1.326e-03
## [3,]  0.002941 0.00110 0.0000174      2.577e-05
## [4,] -0.522298 0.11435 0.0018080      1.781e-03
## [5,] -1.234641 0.11131 0.0017599      1.859e-03
## [6,] -1.093871 0.13165 0.0020816      2.591e-03
## [7,] -0.064123 0.05942 0.0009395      1.619e-03
## 
## 2. Quantiles for each variable:
## 
##            2.5%       25%       50%       75%     97.5%
## var1 -1.2247602 -1.059257 -0.970870 -0.889741 -0.733411
## var2  0.0269885  0.090098  0.121863  0.155585  0.220662
## var3  0.0007652  0.002207  0.002925  0.003685  0.005049
## var4 -0.7477149 -0.598898 -0.522549 -0.445173 -0.296718
## var5 -1.4520842 -1.309922 -1.234633 -1.160468 -1.018396
## var6 -1.3503717 -1.182381 -1.092511 -1.005472 -0.837939
## var7 -0.1791758 -0.103506 -0.064418 -0.024499  0.051849

betatilde2 <- Results$betadraw[,8:14] / sqrt(Results$sigmadraw[,4])
summary(coda::mcmc(betatilde2))

## 
## Iterations = 1:4000
## Thinning interval = 1 
## Number of chains = 1 
## Sample size per chain = 4000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##           Mean        SD  Naive SE Time-series SE
## [1,]  0.567199 0.1346170 2.128e-03      2.409e-03
## [2,]  0.306014 0.0242202 3.830e-04      3.672e-04
## [3,]  0.009125 0.0006758 1.068e-05      1.109e-05
## [4,] -0.221882 0.1347631 2.131e-03      2.447e-03
## [5,] -0.421087 0.1307593 2.067e-03      2.364e-03
## [6,] -0.435578 0.1370951 2.168e-03      2.449e-03
## [7,]  0.224500 0.0304865 4.820e-04      4.829e-04
## 
## 2. Quantiles for each variable:
## 
##           2.5%       25%       50%       75%    97.5%
## var1  0.306343  0.477265  0.564698  0.656616  0.82932
## var2  0.258347  0.289819  0.305902  0.322116  0.35284
## var3  0.007848  0.008656  0.009124  0.009591  0.01045
## var4 -0.488810 -0.313532 -0.218459 -0.130373  0.04144
## var5 -0.677686 -0.511529 -0.418139 -0.332415 -0.17322
## var6 -0.703355 -0.527642 -0.433378 -0.341355 -0.16989
## var7  0.164388  0.203623  0.224533  0.245306  0.28513

sigmadraw12 <-  Results$sigmadraw[,3] / (Results$sigmadraw[,1]*Results$sigmadraw[,4])^0.5
summary(coda::mcmc(sigmadraw12))

## 
## Iterations = 1:4000
## Thinning interval = 1 
## Number of chains = 1 
## Sample size per chain = 4000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##           Mean             SD       Naive SE Time-series SE 
##      -0.003338       0.033076       0.000523       0.001288 
## 
## 2. Quantiles for each variable:
## 
##      2.5%       25%       50%       75%     97.5% 
## -0.070515 -0.025009 -0.002895  0.018432  0.060986

The previous R code demonstrates how to obtain the posterior draws using the rmvpGibbs command from the bayesm package. The results suggest that females, older individuals, and those who self-assess their health as poor are more likely to be hospitalized. Furthermore, females, older individuals, and those with a poor or fair self-perception of health, who have lived a larger proportion of their life in their current neighborhood, are more likely to be enrolled in the subsidized health care system. However, the results indicate that there is no unobserved correlation between the two equations, as the 95% credible interval for the correlation is (-0.07, 0.06).

References

Edwards, Y. D., and G. M. Allenby. 2003. “Multivariate Analysis of Multiple Response Data.” Journal of Marketing Research 40: 321–34.

Ramírez-Hassan, A., and R. Guerra-Urzola. 2021. “Bayesian Treatment Effects Due to a Subsidized Health Program: The Case of Preventive Health Care Utilization in Medellín (Colombia).” Empirical Economics 60: 1477–1506.

In a Bayesian setting, a model can be non-identified; however, the posterior distribution of the model parameters exists as long as a proper prior distribution is specified Edwards and Allenby (2003).↩︎
Note that the order of the location coefficients in our GUI follows the equations, not the order of the regressors as in the theoretical setting presented in this section. This distinction is important for correctly setting the hyperparameters and interpreting the results of the location parameters.↩︎