Chapter 9 Algorithm to fit a stump


Algorithm type: Metropolis within GIBBS for a hierachical BART models

Reason: We have closed posteriors for most parameters, but not for the tree structure

Data: Target variable \(y\), groups \(j = 1,\dots, J\), set of features X

Result: Posterior distributions for all parameters


Initialisation;

Hyper-parameters values for \(\alpha, \beta, k_1, k_2\);

Number of groups \(J\);

Number of trees P;

Number of observations \(N =\sum_{j = 1}^{J} n_j\);

Number of iterations I;

  • define \(\mu_{\mu} = 0\), \(\mu^{0}\), \(\tau^{0}\), and \(\mu_j^{0}, j = 1,\dots, J\) as the initial parameter values

  • for i from 1 to I do:

  • for p from 1 to P do:

  • sample \(\mu_p\) from the posterior \(N(\frac{\mathbf{1}^{T} \Psi^{-1} R_p }{\mathbf{1}^{T} \Psi^{-1} \mathbf{1} + (k_2/P)^{-1}}, \tau^{-1} (\mathbf{1}^{T} \Psi^{-1} \mathbf{1} + (k_2/P)^{-1}))\)

  • for j in 1:J do:

    • sample \(\mu_{jp}\) from the posterior \(MVN(\frac{P \mu /k_1 + \bar R_{pj} n_j}{(n_j + P/k_1)}, \tau^{-1} (n_j + P/k_1))\)
  • end

  • set \(R_{ijp} = Y_{ij} - \sum_{l = 1}^{p} \mu_{jl}\), or, in other words, the residuals for tree \(p\) are the vector \(y\) minus the sum of the \(\mu_{jp}\) values up to tree p

  • end

  • Define \(\hat f_{lj}\) as the current overall prediction for observation \(R_{lj}\), which will be {p = 1}^{P} {lp}$, where l represents the observation index

  • sample \(\tau\) from the posterior \(Ga(\frac{n + P + 1}{2} + \alpha, \frac{\sum_{l, j} (y_{lj} - \hat f_{lj})^2}{2} + \frac{\sum_{j, p} (\mu_{jp} - \mu_{p})^2}{2} + \frac{\sum_{j, p} \mu_{p}^2}{2} + \beta)\)

  • sample \(k_1\) from a Uniform(0, 20)

    • calculate p_old as the conditional distribution for the current \(k_1\)

    • calculate p_new as the conditional distribution for the newly sampled \(k_1\)

    • sample U from a Unif(0, 1)

    • if (a > u) accept k1_new

  • end