Chapter 19 Comparing Two Measures of Centrality


A common question in medical research is whether one group had a better outcome than another group. These outcomes can be measured with dichotomous outcomes like death or hospitalization, but continuous outcomes like systolic blood pressure, endoscopic score, or ejection fraction are more commonly available, and provide more statistical power, and usually require a smaller sample size.
There is a tendency in clinical research to focus on dichotomous outcomes, even to the point of converting continuous measures to dichotomous ones (aka “dichotomania”, see Frank Harrell comments here), for fear of detecting and acting upon a small change in a continuous outcome that is not clinically meaningful.
While this can be a concern, especially in very large, over-powered studies, it can be addressed by aiming for a continuous difference that is at least as large as one that many clinicians agree (a priori) is clinically important (the MCID, or Minimum Clinically Important Difference).
The most common comparison of two groups with a continuous outcome is to look at the means or medians, and determine whether the available evidence suggests that these are equal (the null hypothesis). This can be done for means with Student’s t-test.
Let’s start by looking at the cytomegalovirus data set. This includes data on 64 patients who received bone marrow stem cell transplant, and looks at their time to activation of CMV (cytomegalovirus). In the code chunk below, we group the data by donor cmv status (donor.cmv), and look at the mean time to CMV activation (time.to.cmv variable). Run the code (using the green arrow at the top right of the code chunk below) to see the difference in time to CMV activation in months between groups.

Try out some other grouping variables in the group_by statement, in place of donor.cmv. Consider variables like race, sex, and recipient.cmv. Edit the code and run it again with the green arrow at the top right.

# insert libraries in each chunk as if independent
library(tidyverse)
library(medicaldata)

cytomegalovirus %>% 
  group_by(sex) %>% 
  summarize(mean_time2cmv = mean(time.to.cmv)) ->
summ

summ
## # A tibble: 2 × 2
##     sex mean_time2cmv
##   <dbl>         <dbl>
## 1     0          13.7
## 2     1          12.7

That seems like a big difference for donor.cmv, between 13.7303333 months and 12.7441176 months. And it makes theoretical sense that having a CMV positive donor is more likely to be associated with early activation of CMV in the recipient. But is it a significant difference, one that would be very unlikely to happen by chance? That depends on things like the number of people in each group, and the standard deviation in each group. That is the kind of question you can answer with a t-test, or for particularly skewed data like hospital length of stay or medical charges, a Wilcoxon test.

19.0.1 Applying the t test

The t-test is a simple test that compares the means of two groups, and tells you how likely it is that the difference you see is due to random chance. The t-test assumes that the data is normally distributed, and that the variances are equal. If the data is not normally distributed, or the variances are not equal, you can use a non-parametric test like the Wilcoxon test.

cytomegalovirus |> 
  rstatix::t_test(time.to.cmv ~ cgvhd,
                  detailed = TRUE) 
## # A tibble: 1 × 15
##   estimate estimate1 estimate2 .y.       group1 group2    n1
## *    <dbl>     <dbl>     <dbl> <chr>     <chr>  <chr>  <int>
## 1    -13.8      7.18      21.0 time.to.… 0      1         36
## # ℹ 8 more variables: n2 <int>, statistic <dbl>, p <dbl>,
## #   df <dbl>, conf.low <dbl>, conf.high <dbl>,
## #   method <chr>, alternative <chr>