2 Logistic Models and Deviance

When fitting a logistic regression model to a data set, R outputs several key metrics, including the null deviance and the residual deviance, along with their respective degrees of freedom. This chapter aims to clarify the calculation methods and meaning of these deviance values. We will use the first 20 observations from a heart disease data set, fit a logistic regression model to this data, and manually compute the deviance values to illustrate how they are actually calculated.

# Read the data and only retain the first 20 observations
heart <- readr::read_csv(file = "data/heart.xls", show_col_types = FALSE)
heart <- head(heart, 20)

2.1 Models

To understand what these deviance values are, we must first understand the following three models:

  • the null model (\(M_0\)),
  • the proposed model (\(M_p\)),
  • and the saturated model (\(M_s\)).

These three models are used to calculate the two deviance values outputted by R, i.e., the null deviance (the deviance of \(M_0\)) and the residual deviance(the deviance of \(M_p\)).

2.1.1 Proposed Model

The proposed model \(M_p\) is the model of interest. In our example, it is used to estimate the probability of heart disease for each patient, conditioning on that patient’s age. The model is represented as follows:

\[ logit(\pi_i) = \beta_0 + \beta_1 \times Age_i + \epsilon_i \]

with \(\pi_i = P(HD_i = 1 | Age_i)\) the conditional probability of a heart disease for patient \(i\), \(Age_i\) the age of patient \(i\), and \(\epsilon_i\) the random error of the model for the \(i\)’th observation. We can fit the proposed model as follows:

fit_p <- glm(
  formula = HeartDisease ~ Age,
  data = heart,
  family = binomial(link = "logit")
)
## 
## Call:
## glm(formula = HeartDisease ~ Age, family = binomial(link = "logit"), 
##     data = heart)
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.48884    2.87127  -0.867    0.386
## Age          0.04571    0.06191   0.738    0.460
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 26.920  on 19  degrees of freedom
## Residual deviance: 26.365  on 18  degrees of freedom
## AIC: 30.365
## 
## Number of Fisher Scoring iterations: 4

We obtain the null deviance \(D_0 = 26.9\) and the residual deviance \(D_p = 26.4\).

2.1.2 Null Model

The null model \(M_0\) is a model without any predictors. It only contains an intercept \(\beta_0\):

\[ logit(\pi_i) = \beta_0 + \epsilon_i \]

The estimated probability of heart disease is the same for each patient. The null model \(M_0\) is fitted as follows:

fit_0 <- glm(
  formula = HeartDisease ~ 1,
  data = heart,
  family = binomial(link = "logit")
)
## 
## Call:
## glm(formula = HeartDisease ~ 1, family = binomial(link = "logit"), 
##     data = heart)
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept)  -0.4055     0.4564  -0.888    0.374
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 26.92  on 19  degrees of freedom
## Residual deviance: 26.92  on 19  degrees of freedom
## AIC: 28.92
## 
## Number of Fisher Scoring iterations: 4

R outputs the null deviance \(D_0\) and the residual deviance \(D_p\), which both equal 26.9. They are the same because \(M_p\) is the same as \(M_0\) (neither of them contains a predictor, only an intercept).

2.1.3 Saturated Model

The saturated model \(M_s\) is a model that perfectly fits data. This model perfectly predicts the outcome for each patient in the data set. If patient \(i\) has a heart disease (\(HD_i = 1\)), then \(\hat{\pi_i} = 1\). If patient \(i\) does not have a heart disease (\(HD_i = 0\)), then \(\hat{\pi_i} = 0\). We will not fit a saturated model \(M_s\) as such to the data, but continue our explanation knowing the fact that the model perfectly predicts the outcome for each observation \(i\).

2.1.4 Models and Deviance

Since the saturated model \(M_s\) perfectly fits the data, it serves as our reference point for assessing goodness-of-fit. The proposed model \(M_p\) and the null model \(M_0\) are compared to the saturated model in terms of their goodness-of-fit. The result of these comparisons is a deviance value for each model. The deviance quantifies the degree to which the null model and the proposed model deviate from the saturated model in terms of goodness-of-fit. To calculate the actual deviance, we utilize the likelihoods and log-likelihoods of the models (see below for further details).

2.2 Likelihood and Log-Likelihood

The likelihood is a probability that indicates how likely it is that the parameters take on particular values given a particular sample. As the sample is given, the values for the outcome and predictor are not variables. The parameter values are the variables. We can represent the likelihood function as follows:

\[ L(\theta | (x_1, y_1), (x_2, y_2), ..., (x_n, y_n)) = \prod_{i = 1}^{n} f_Y(y_i; x_i, \theta) \]

with \(\theta\) the parameters of interest, \(n\) the sample size, and \(f_Y(y; x, \theta)\) the probability mass function for the response. Calculating the likelihood involves taking the product of many probabilities (i.e., values between 0 and 1), and therefore results in a very small value. For computational reasons, often the log-likelihood is calculated instead of the likelihood.

\[ \begin{aligned} l(\theta | (x_1, y_1), (x_2, y_2), ..., (x_n, y_n)) &= log[\prod_{i = 1}^{n} f_Y(y_i; x_i, \theta)] \\ &= \sum_{i = 1}^{n} log(f_Y(y_i; x_i, \theta)) \end{aligned} \]

2.2.1 Proposed Model

We start by calculating the likelihood and log-likelihood of \(M_p\).

\[ L(\theta | (x_1, y_1), (x_2, y_2), ..., (x_n, y_n)) = \prod_{i = 1}^{n} f_Y(y_i; x_i, \theta) \]

with \(\theta\) representing the parameters that are the variables and \(n\) the sample size. To define the likelihood function, we must first determine the probability mass function for the response. The PMF defines the probability distribution of a discrete random variable. Remember that a logistic regression model is used to predict the outcome for a discrete random variable, and more specifically a binary random variable that follows a Bernoulli distribution:

\[ HD \sim Bernoulli(\pi) \]

The probability mass function (PMF) of a Bernoulli distribution is the following:

\[ f_Y(y; \pi) = \pi^{y} \times (1 - \pi)^{1 - y} \]

Therefore, we use the PMF of the Bernoulli distribution for the likelihood function:

\[ L(\theta | (x_1, y_1), (x_2, y_2), ..., (x_n, y_n)) = \prod_{i = 1}^{n} \pi_{i}^{y_i} \times (1 - \pi_i)^{1 - y_i} \]

with \(\theta\) representing the regression coefficients of the logistic regression model. We can now also define the log-likelihood function:

\[ \begin{aligned} l(\theta | (x_1, y_1), (x_2, y_2), ..., (x_n, y_n)) &= log[\prod_{i = 1}^{n} \pi_{i}^{y_i} \times (1 - \pi_i)^{1 - y_i}] \\ &= \sum_{i = 1}^{n} [y_i \times log(\pi_i) + (1 - y_i) \times log(1 - \pi_i)] \end{aligned} \]

Using our fitted model \(M_p\), how can we now obtain the log-likelihood? We start by obtaining the estimated probability of observing the actual outcome for each observation in the data set. We focus on the third observation in the data set. We see that \(Age_3 = 37\) and \(HD_3 = 0\). We can use our model to obtain the probability of a heart disease:

# Estimate the probability of a heart disease for i = 3
hd_prob_003 <- predict(fit_p, type = "response")[3]
hd_prob_003
##         3 
## 0.3105577

We see that this probability is \(P(HD_3 = 1 | Age_i = 37) = \pi_i = 0.31\). Assume that our sample only comprises this one patient, then the likelihood would be:

\[ \begin{aligned} L(\theta | (Age_i, HD_i)) &= \prod_{i = 1}^{n} f_Y(y_i; x_i, \theta) \\ &= \prod_{i = 1}^{n} \pi_{i}^{y_i} \times (1 - \pi_i)^{1 - y_i} \\ &= 0.3105577^{0} \times (1 - 0.3105577)^{1 - 0} \\ &= 1 \times 0.6894423^{1} \\ &= 0.6894423 \end{aligned} \]

So, to calculate the likelihood, first, we calculate the estimated probability with which the actual outcome occurs.

# Create a copy of the data frame as to not overwrite the original data
heart_p <- heart

# Calculate the probability of a patient having or not having a heart disease 
# using the proposed model (i.e., given the patient's age)
heart_p$EstProb_HD_1 <- predict(fit_p, type = "response")
heart_p$EstProb_HD_0 <- 1 - predict(fit_p, type = "response")

heart_p <- heart_p %>%
  dplyr::mutate(
    ProbOutcome = EstProb_HD_1^HeartDisease * EstProb_HD_0^(1 - HeartDisease),
    LogProbOutcome = log(ProbOutcome)
  )
Age Heart Disease P(HD = 1) P(HD = 0) P(Outcome) Log(P(Outcome))
40 0 0.3406546 0.6593454 0.6593454 -0.4165078
49 1 0.4380759 0.5619241 0.4380759 -0.8253630
37 0 0.3105577 0.6894423 0.6894423 -0.3718723
48 1 0.4268570 0.5731430 0.4268570 -0.8513061
54 0 0.4948960 0.5051040 0.5051040 -0.6829909
39 0 0.3304634 0.6695366 0.6695366 -0.4011694
45 0 0.3936917 0.6063083 0.6063083 -0.5003666
54 0 0.4948960 0.5051040 0.5051040 -0.6829909
37 1 0.3105577 0.6894423 0.3105577 -1.1693854
48 0 0.4268570 0.5731430 0.5731430 -0.5566201
37 0 0.3105577 0.6894423 0.6894423 -0.3718723
58 1 0.5405183 0.4594817 0.5405183 -0.6152267
39 0 0.3304634 0.6695366 0.6695366 -0.4011694
49 1 0.4380759 0.5619241 0.4380759 -0.8253630
42 0 0.3614779 0.6385221 0.6385221 -0.4485990
54 0 0.4948960 0.5051040 0.5051040 -0.6829909
38 1 0.3204289 0.6795711 0.3204289 -1.1380949
43 0 0.3720940 0.6279060 0.6279060 -0.4653649
60 1 0.5631245 0.4368755 0.5631245 -0.5742545
36 1 0.3008561 0.6991439 0.3008561 -1.2011233

The above data show, given the fitted model, the probability with which each actual outcome occurs, given the patient’s age, and also the log-transformed probability. To obtain the log-likelihood of the fitted model \(LL_p\), we can now just sum these log-transformed probabilities:

# Obtain the log-likelihood for the proposed model
ll_p <- sum(heart_p$LogProbOutcome)
ll_p
## [1] -13.18263

2.2.2 Null Model

We can also do this for the null model. Remember that the fitted probability of a heart disease will be the same for each patient, as we do not take into account any patient information. We can then compute the log-likelihood of the null model as follows:

# Create a copy of the data frame as to not overwrite the original data
heart_0 <- heart

# Calculate the probability of a patient having or not having a heart disease 
# using the null model (i.e., not taking into account any patient information)
heart_0$EstProb_HD_1 <- predict(fit_0, type = "response")
heart_0$EstProb_HD_0 <- 1 - predict(fit_0, type = "response")

heart_0 <- heart_0 %>%
  dplyr::mutate(
    ProbOutcome = EstProb_HD_1^HeartDisease * EstProb_HD_0^(1 - HeartDisease),
    LogProbOutcome = log(ProbOutcome)
  )
Age Heart Disease P(HD = 1) P(HD = 0) P(Outcome) Log(P(Outcome))
40 0 0.4 0.6 0.6 -0.5108256
49 1 0.4 0.6 0.4 -0.9162907
37 0 0.4 0.6 0.6 -0.5108256
48 1 0.4 0.6 0.4 -0.9162907
54 0 0.4 0.6 0.6 -0.5108256
39 0 0.4 0.6 0.6 -0.5108256
45 0 0.4 0.6 0.6 -0.5108256
54 0 0.4 0.6 0.6 -0.5108256
37 1 0.4 0.6 0.4 -0.9162907
48 0 0.4 0.6 0.6 -0.5108256
37 0 0.4 0.6 0.6 -0.5108256
58 1 0.4 0.6 0.4 -0.9162907
39 0 0.4 0.6 0.6 -0.5108256
49 1 0.4 0.6 0.4 -0.9162907
42 0 0.4 0.6 0.6 -0.5108256
54 0 0.4 0.6 0.6 -0.5108256
38 1 0.4 0.6 0.4 -0.9162907
43 0 0.4 0.6 0.6 -0.5108256
60 1 0.4 0.6 0.4 -0.9162907
36 1 0.4 0.6 0.4 -0.9162907

We obtain the log-likelihood of the null model \(LL_0\) as follows:

ll_0 <- sum(heart_0$LogProbOutcome)
ll_0
## [1] -13.46023

2.2.3 Saturated Model

We can follow the same procedure for \(M_s\). The saturated model perfectly fits the data, so the estimated probability of heart disease reflects the actual outcome. In other words, if \(HD_i = 1\), then \(P(HD_i = 1) = 1\). And if \(HD_i = 0\), then \(P(HD_i = 1) = 0\).

# Create a copy of the data frame as to not overwrite the original data
heart_s <- heart

# Calculate the probability of a patient having or not having a heart disease 
# using the saturated model (i.e., the model perfectly fits the data)
heart_s$EstProb_HD_1 <- heart_s$HeartDisease
heart_s$EstProb_HD_0 <- 1 - heart_s$HeartDisease

heart_s <- heart_s %>%
  dplyr::mutate(
    ProbOutcome = EstProb_HD_1^HeartDisease * EstProb_HD_0^(1 - HeartDisease),
    LogProbOutcome = log(ProbOutcome)
  )
Age Heart Disease P(HD = 1) P(HD = 0) P(Outcome) Log(P(Outcome))
40 0 0 1 1 0
49 1 1 0 1 0
37 0 0 1 1 0
48 1 1 0 1 0
54 0 0 1 1 0
39 0 0 1 1 0
45 0 0 1 1 0
54 0 0 1 1 0
37 1 1 0 1 0
48 0 0 1 1 0
37 0 0 1 1 0
58 1 1 0 1 0
39 0 0 1 1 0
49 1 1 0 1 0
42 0 0 1 1 0
54 0 0 1 1 0
38 1 1 0 1 0
43 0 0 1 1 0
60 1 1 0 1 0
36 1 1 0 1 0

Finally, we obtain the log-likelihood for the saturated model \(LL_s\):

ll_s <- sum(heart_s$LogProbOutcome)
ll_s
## [1] 0

2.3 Deviance

As stated earlier, the saturated model \(M_s\) perfectly fits the data, and therefore serves as our reference point for goodness-of-fit. We now use the log-likelihood values of these models to calculate the deviance. For the residual deviance \(D_p\), we have:

\[ \begin{aligned} D_p &= 2 \times (LL_s - LL_p) \\ &= 2 \times (0 - (-13.18263)) \\ &= 2 \times 13.18263 \\ &= 26.36526 \end{aligned} \]

and for the null deviance \(D_0\), we have:

\[ \begin{aligned} D_0 &= 2 \times (LL_s - LL_0) \\ &= 2 \times (0 - (-13.46023)) \\ &= 2 \times 13.46023 \\ &= 26.92047 \end{aligned} \]

As we can see, the deviance is two times the difference between the log-likelihood of the saturated model and the log-likelihood of the proposed model for the residual deviance or the log-likelihood of the null model for the null deviance. The deviance quantifies how much the null and proposed models differ from the saturated model in terms of goodness-of-fit.

In a following chapter, we will explain how we can use these deviance value to perform statistical hypothesis tests to formally compare the proposed model to the null model.