A short course on Survival Analysis applied to the Financial Industry

4.2 Estimation of the Joint Model

In this section we are going to present the joint modelling framework motivated by the time-to-event point of view, that is, we want to add a time-dependent covariate measured with error in a survival model.

Let \(T_i\) denote the observed failure time for the \(i\)-th subject \((i = 1,...,n)\), which is taken as the minimum of the true event time \(T_i\) and the censoring time \(C_i\), i.e., \(\widetilde T_i = \min(T_i,C_i)\). Furthermore, we define the event indicator as \(\Delta_i = I(T_i \le C_i)\), where \(I\) is the indicator function that takes the value 1 if the condition \(T_i \le C_i\) is satisfied, and 0 otherwise. So, the observed data for the time-to-event outcome consist of the pairs \(\{(\widetilde T_i, \Delta_i), i = 1, . . . , n\}\).

For the longitudinal responses, let \(y_i(t)\) denote the value of the longitudinal outcome at time point \(t\) for the \(i\)-th subject. Note that we do not actually observe \(y_i(t)\) at all time points, but only at the very specific occasions \(t_{ij}\) at which measurements were taken. Thus, the observed longitudinal data consist of the measurements \(y_{ij} = \{yi(t_{ij}),j = 1,...,n_i\}\).

The objective is to associate the true and unobserved value of the longitudinal outcome at time \(t\), denoted by \(m_i(t)\), with the event outcome \(\widetilde T_i\). Note that \(m_i(t)\) is different from \(y_i(t)\) because this last is contaminated with measurement error value of the longitudinal outcome at time \(t\).

In order to quantify the effect of \(m_i(t)\) on the risk of an event, we can use a relative risk model of the form:

\[\begin{equation} h_i(t|\mathcal{M}_i(t),w_i) = h_0(t) \exp \{ \gamma^t w_i + \alpha m_i(t) \} \tag{4.1} \end{equation}\]

where \(\mathcal{M}_i(t)=\{ m_i(s), 0 \le s < t\}\) denotes the denotes the history of the true unobserved longitudinal process up to time point \(t\), \(h_0(\cdot)\) denotes the baseline risk function, and \(w_i\) is a vector of baseline covariates (such as a treatment indicator, history of diseases, etc.) with a corresponding vector of regression coefficients \(\gamma\). Similarly, parameter \(\alpha\) quantifies the effect of the underlying longitudinal outcome to the risk for an event.

The interpretation of \(\gamma\) and \(\alpha\) is exactly the same as we have seen in Chapter 3. In particular, \(exp(\gamma_j)\) denotes the ratio of hazards for one unit change in \(w_{ij}\) at any time \(t\), whereas \(exp(\alpha)\) denotes the relative increase in the risk for an event at time \(t\) that results from one unit increase in \(m_i(t)\) at the same time point.

In order to complete the specification of the above model, we need to think about the choice for the baseline risk function \(h_0(\cdot)\). In standard survival analysis it is customary to leave \(h_0(\cdot)\) completely unspecified in order to avoid the impact of misspecifying the distribution of survival times. However, within the joint modeling framework, it turns out that following such a route may lead to an underestimation of the standard errors of the parameter estimates (Hsieh, Tseng, and Wang 2006). To avoid such problems we will need to explicitly define \(h_0(\cdot)\), for example with a known parametric distribution or alternatively, and even more preferably, we can opt for a parametric but flexible specification of the baseline risk function. Several approaches are implemented in the JM package under the argument method.

The longitudinal submodel

In the model above we use \(m_i(t)\) to denote the true value of the underlying longitudinal covariate at time point \(t\). However, and as mentioned earlier, longitudinal information is actually collected intermittently and with error at a set of a few time points \(t_{ij}\) for each subject. So, to messure the effect of the longitudinal variable on the risk dor an event, we need to estimate \(m_i(t)\). To do this, we are going to use the linear mixed models of the form

\[ y_i(t) = m_i(t) + \varepsilon_i(t), \]

\[ m_i(t) = x_i^T(t)\beta + z_i^T(t)b_i + \varepsilon_i(t), \]

\[ b_i \sim N(0, D), \quad \varepsilon_i(t) \sim N(0, \sigma^2), \] where \(\beta\) denotes the vector of the unknown fixed effects parameters, \(b_i\) denotes a vector of random effects, \(x_i(t)\) and \(z_i(t)\) denote row vectors of the design matrices for the fixed and random effects, respectively, and \(\varepsilon_i(t)\) is the measument error term, which is assumed independent of \(b_i\).

The main estimation methods for joint models are based on (semiparametric) maximum likelihood and Bayes using MCMM techniques. The JM package that we are going to use is based on maximum likelihood. The idea is the maximization of the log-likelihood corresponding to the joint distribution of the time-to-event and longitudinal out-comes \(\{\widetilde T_i,\Delta_i,y_i\}\). Standard numerical integration techniques such as Gaussian quadrature and Monte Carlo have been successfully applied in the joint modelling framework. See Section 4.3 of Rizopoulos (2012) for details.

References

Hsieh, Fushing, Yi-Kuan Tseng, and Jane-Ling Wang. 2006. “Joint Modeling of Survival and Longitudinal Data: Likelihood Approach Revisited.” Biometrics 62 (4). Blackwell Publishing Inc: 1037–43. doi:10.1111/j.1541-0420.2006.00570.x.

Rizopoulos, Dimitris. 2012. Joint Models for Longitudinal and Time-to-Event Data: With Applications in R. 1st ed. Chapman & Hall/Crc Biostatistics Series. Chapman; Hall/CRC. http://gen.lib.rus.ec/book/index.php?md5=99C084286F029FD36E10FE18AA587A7C.