1 Introduction

pacman::p_load(dplyr, simstudy, summarytools, kableExtra,
               brms, bayestestR, emmeans)
set.seed(1087) # for reproducible simulations
st_options(plain.ascii = F, footnote = NA)
knitr::opts_knit$set(root.dir = rprojroot::find_rstudio_root_file())
options(knitr.kable.NA = '')
options(mc.cores = parallel::detectCores(),
        brms.backend = "rstan")

Data is simulated using the clustering structure (vignettes clustered in respondents) of the study. The detailed factorial survey experiment design is available here.

The dependent variable is simulated as a normal distribution (\(\text{Justice} \sim N(3, 1)\)) which is a function of each factor described below and random noise. This distribution is roughly one that spans from 1 to 6, emulating a rating scale for the vignettes where each answer represents two months (~60.8 days) from December-November (1) to February-January (6). The effects of the regression coefficients in a linear (mixed) model can be regarded as the change in such a scale where, e.g., in the gender factor (1 = female, 0 = male) a change of 0.115 represents a justice evaluation that the vaccine should be delivered to women approximately one week before compared to men.

We assume the six factors of the study with the following effects:

Gender: no practical difference between the just priority of vaccinating men or women.
Age: a just priority of ~.4 is set for a unit increase of age group.
Work status: a just priority of ~.3 favoring health workers – compared to transport, higher education and unemployed workers – is set. No practical differences are present between the rest of the comparisons.
Caring duties: a just priority of ~.15 for those with caring responsibilities of elder people – compared to no caring responsibilities – is set.
Country of birth: a just priority of ~.25 favoring Chileans – compared to people born in Venezuela, Haiti and Spain – is set. No practical differences are present between the rest of the comparisons.
Comorbidity: just priority of ~.2 for those with diabetes type 2 – compared to those without comorbidities – is set.

Furthermore, for the null hypotheses a smallest effect size of interest is defined as an interval equivalent to approximately one week before or after (-0.115, 0.115). This follows the rationale that a minimally interesting practical difference is an effect of at least a one-week change considering how inoculation strategies and logistics of scheduling vaccinations operate. In other words, an estimated priority smaller than one week would be difficult to implement in inoculation plans.

# Simulate respondent data
gen.resp <- defData(varname = "female_r", dist = "binary", 
                    formula = .6, id = "id_resp")
gen.resp <- defData(gen.resp, varname = "age_r", dist = "negBinomial", 
                    formula = 44, variance = .02)
dt.resp <- genData(1000, gen.resp)

# Add vignette's factors
gen.vign <- defDataAdd(varname = "work_status_v", dist = "uniformInt", 
                       formula = "1;4")
gen.vign <- defDataAdd(gen.vign, varname = "caring_v", dist = "binary", 
                       formula = ".5")
gen.vign <- defDataAdd(gen.vign, varname = "female_v", dist = "binary", 
                       formula = ".5")
gen.vign <- defDataAdd(gen.vign, varname = "age_v", dist = "uniformInt", 
                       formula = "1;8")
gen.vign <- defDataAdd(gen.vign, varname = "diabetes_v", dist = "binary", 
                       formula = ".5")
gen.vign <- defDataAdd(gen.vign, varname = "country_v", dist = "uniformInt", 
                       formula = "1;4")

# Simulate dependent variable
## Define conditional dependence on other factors with fomula argument
# gen.vign <- defDataAdd(gen.vign, varname = "justice", dist = "categorical", 
#                       formula = "1;5")
formula <- "3 + .4 * age_v + .2 * diabetes_v + .15 * caring_v + 
            (country_v != 1)*-.25 + (work_status_v != 1)*-.3"
## Simulate justice evaluations as normal distributed response
gen.vign <- defDataAdd(gen.vign, varname = "justice", dist = "normal", 
                       formula = formula, 
                       variance = 1)
dt.vign <- genCluster(dt.resp, "id_resp", numIndsVar = 4, 
                      level1ID = "id_vign")
dt.vign <- addColumns(gen.vign, dt.vign)

cors <- dt.vign %>% 
  select(ends_with("_v")) %>% 
  cor(.) %>% 
  round(3)

dt.vign <- dt.vign %>% 
  mutate(country_v = case_when(country_v == 1 ~ "Chile",
                               country_v == 2 ~ "Venezuela",
                               country_v == 3 ~ "Haiti",
                               country_v == 4 ~ "Spain"),
         work_status_v = case_when(work_status_v == 1 ~ "Health",
                                   work_status_v == 2 ~ "Transport",
                                   work_status_v == 3 ~ "Higher ed",
                                   work_status_v == 4 ~ "Unemployed"),
         across(-c(id_resp, id_vign, justice, starts_with("age")), ~factor(.)))
print(dfSummary(dt.vign, graph.magnif = 0.75), method = 'render',
      valid.col = FALSE, na.col = FALSE)

Data Frame Summary

dt.vign

Dimensions: 4000 x 11
Duplicates: 0

Variable

Stats / Values

Freqs (% of Valid)

Graph

id_resp [integer]

Mean (sd) : 500.5 (288.7) min < med < max: 1 < 500.5 < 1000 IQR (CV) : 499.5 (0.6)

1000 distinct values

female_r [factor]

1. 0 2. 1

1740	(	43.5%	)
2260	(	56.5%	)

age_r [integer]

Mean (sd) : 44 (9.3) min < med < max: 21 < 43.5 < 79 IQR (CV) : 14 (0.2)

51 distinct values

id_vign [integer]

Mean (sd) : 2000.5 (1154.8) min < med < max: 1 < 2000.5 < 4000 IQR (CV) : 1999.5 (0.6)

4000 distinct values (Integer sequence)

work_status_v [factor]

1. Health 2. Higher ed 3. Transport 4. Unemployed

980	(	24.5%	)
1077	(	26.9%	)
967	(	24.2%	)
976	(	24.4%	)

caring_v [factor]

1. 0 2. 1

2049	(	51.2%	)
1951	(	48.8%	)

female_v [factor]

1. 0 2. 1

2000	(	50.0%	)
2000	(	50.0%	)

age_v [integer]

Mean (sd) : 4.4 (2.3) min < med < max: 1 < 4 < 8 IQR (CV) : 4 (0.5)

1	:	519	(	13.0%	)
2	:	518	(	13.0%	)
3	:	502	(	12.6%	)
4	:	527	(	13.2%	)
5	:	482	(	12.0%	)
6	:	521	(	13.0%	)
7	:	482	(	12.0%	)
8	:	449	(	11.2%	)

diabetes_v [factor]

1. 0 2. 1

1989	(	49.7%	)
2011	(	50.3%	)

country_v [factor]

1. Chile 2. Haiti 3. Spain 4. Venezuela

968	(	24.2%	)
1020	(	25.5%	)
1046	(	26.2%	)
966	(	24.1%	)

justice [numeric]

Mean (sd) : 4.5 (1.4) min < med < max: -0.1 < 4.5 < 9.3 IQR (CV) : 2 (0.3)

4000 distinct values

cors[lower.tri(cors)] <- NA
diag(cors) <- NA
cors[-nrow(cors), -1] %>% 
    kbl(caption = "Pearson correlations between factors") %>% 
    kable_styling(bootstrap_options = c("striped", "bordered"))

Table 1.1: Pearson correlations between factors
	caring_v	female_v	age_v	diabetes_v	country_v
work_status_v	0.04	0.024	0.019	-0.016	-0.016
caring_v		-0.024	0.010	-0.010	-0.014
female_v			-0.002	0.007	0.012
age_v				0.008	-0.024
diabetes_v					-0.001

As expected, the experimental factors are practically independent. Also, they have small imbalances representing the expected result from a factorial survey experiment in an empirical application where this tends to occur.

2 Analysis

The model for testing the hypotheses group 1 (HG1) is defined as all main effects from the six factors. Weakly regularizing priors are used.

## Model HG1
f_hg1 <- as.formula("justice ~ female_v + age_v + work_status_v + 
                    country_v + caring_v + diabetes_v +
                    (1 | id_resp)")
## Priors
priors_hg1 <- c(set_prior("student_t(3, 3.6, 1)", class = "Intercept"),
                set_prior("student_t(3, 0, 10)", class = "b"),
                set_prior("student_t(3, 0, 3)", class = "sd"))

In order to make comparisons from the factors with more than two levels, we set orthonormal factor coding for the contrasts.

contrasts(dt.vign$work_status_v) <- contr.bayes
contrasts(dt.vign$country_v) <- contr.bayes

Fit the model using MCMC (20,000 iterations are used to compute Bayes Factors later).

m1 <- brm(f_hg1, 
          data = dt.vign,
          prior = priors_hg1, 
          iter = 20000,
          control = list(adapt_delta = .9),
          save_pars = save_pars(all = T),
          refresh = 0)

summary(m1)

## Warning: There were 91 divergent transitions after warmup. Increasing
## adapt_delta above 0.9 may help. See http://mc-stan.org/misc/
## warnings.html#divergent-transitions-after-warmup

##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: justice ~ female_v + age_v + work_status_v + country_v + caring_v + diabetes_v + (1 | id_resp) 
##    Data: dt.vign (Number of observations: 4000) 
## Samples: 4 chains, each with iter = 20000; warmup = 10000; thin = 1;
##          total post-warmup samples = 40000
## 
## Group-Level Effects: 
## ~id_resp (Number of levels: 1000) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.07      0.05     0.00     0.17 1.00     5344     9774
## 
## Population-Level Effects: 
##                Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept          2.57      0.04     2.48     2.66 1.00    80349    29088
## female_v1         -0.01      0.03    -0.07     0.05 1.00    76474    28786
## age_v              0.41      0.01     0.40     0.42 1.00    79298    26913
## work_status_v1    -0.07      0.03    -0.13    -0.00 1.00    80497    27230
## work_status_v2     0.28      0.03     0.21     0.34 1.00    74424    27489
## work_status_v3     0.04      0.03    -0.02     0.10 1.00    76468    26652
## country_v1        -0.02      0.03    -0.08     0.04 1.00    78687    27587
## country_v2         0.22      0.03     0.16     0.28 1.00    79310    28721
## country_v3        -0.02      0.03    -0.09     0.04 1.00    72506    26858
## caring_v1          0.08      0.03     0.02     0.15 1.00    73773    25719
## diabetes_v1        0.21      0.03     0.15     0.27 1.00    76068    27117
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     1.00      0.01     0.98     1.03 1.00    32556    22888
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

2.1 Bayes factors

sesoi <- c(-.115, .115) # one-week for bimonthly answers
bf_m1a <- bayesfactor_parameters(m1, null = sesoi, direction = "right")
bf_m1b <- bayesfactor_parameters(m1, null = sesoi, direction = "two-sided")
### Country
pairs_country_v <- pairs(emmeans(m1, ~country_v))
bf_m1_cntry_v <- bayesfactor_parameters(pairs_country_v, m1, 
                                      null = sesoi, direction = "two-sided")
pairs_work_v <- pairs(emmeans(m1, ~work_status_v))
bf_m1_work_v <- bayesfactor_parameters(pairs_work_v, m1, 
                                       null = sesoi, direction = "right")

# Custom BF table
bf1_alt1 <- as.data.frame(bf_m1a) %>% 
  filter(!grepl("tercept|country|female|work", Parameter)) %>% 
  mutate(Test = "Greater than")
bf1_alt2 <- as.data.frame(bf_m1_work_v) %>% 
  mutate(Test = "Greather than")
bf1_nul1 <- as.data.frame(bf_m1b) %>% 
  filter(grepl("female", Parameter)) %>% 
  mutate(Test = "Two-sided")
bf1_nul2 <- as.data.frame(bf_m1_cntry_v) %>% 
  mutate(Test = "Two-sided")

bf_sum <- bf1_alt1 %>% 
  bind_rows(bf1_nul1) %>% 
  select(Parameter, BF, Test) %>% 
  bind_rows(bf1_alt2, bf1_nul2) %>% 
  mutate(Null = "-0.115 - 0.115")

bf_sum <- bf_sum %>% 
  mutate(Result = case_when(
    grepl("than", Test) & BF > 10 ~ "Evidence for Ha",
    grepl("than", Test) & BF < 0.01 ~ "Evidence for H0",
    grepl("sided", Test) & BF < 0.01 ~ "Evidence for H0 ",
    grepl("sided", Test) & BF > 10 ~ "Evidence for Ha",
    BF >= 0.01 & BF  <= 10 ~ "Inconclusive",
    TRUE ~ NA_character_),
    Conclusion = case_when(
      grepl("than", Test) & BF > 10 ~ "Support for HG1",
      grepl("than", Test) & BF < 0.01 ~ "Contrary to HG1",
      grepl("sided", Test) & BF < 0.01 ~ "Support for HG1 ",
      grepl("sided", Test) & BF > 10 ~ "Contrary to HG1",
      BF >= 0.01 & BF  <= 10 ~ "--",
      TRUE ~ NA_character_)) %>% 
  mutate(BF = as.character(signif(BF, 2)))

bf_sum %>% 
  kbl(format = "html", digits = 3, escape = FALSE,
      caption = "Bayes factors and conclusion summary") %>%
  kable_styling(bootstrap_options = c("striped", "bordered")) %>% 
  footnote(general = paste0("BF: Bayes Factor; criteria for assessing evidence is", "$BF_{10} > 10$", " or ", "$BF_{10} < 0.01$; comparisons between levels are shown for factors with more than two levels."), 
           footnote_as_chunk = T, escape=F)

Table 2.1: Bayes factors and conclusion summary
Parameter	BF	Test	Null	Result	Conclusion
b_age_v	4.1e+54	Greater than	-0.115 - 0.115	Evidence for Ha	Support for HG1
b_caring_v1	0.0033	Greater than	-0.115 - 0.115	Evidence for H0	Contrary to HG1
b_diabetes_v1	9.3	Greater than	-0.115 - 0.115	Inconclusive	–
b_female_v1	7.4e-06	Two-sided	-0.115 - 0.115	Evidence for H0	Support for HG1
Health - Higher ed	48	Greather than	-0.115 - 0.115	Evidence for Ha	Support for HG1
Health - Transport	240	Greather than	-0.115 - 0.115	Evidence for Ha	Support for HG1
Health - Unemployed	30000	Greather than	-0.115 - 0.115	Evidence for Ha	Support for HG1
Higher ed - Transport	0.00045	Greather than	-0.115 - 0.115	Evidence for H0	Contrary to HG1
Higher ed - Unemployed	0.0085	Greather than	-0.115 - 0.115	Evidence for H0	Contrary to HG1
Transport - Unemployed	0.0026	Greather than	-0.115 - 0.115	Evidence for H0	Contrary to HG1
Chile - Haiti	14	Two-sided	-0.115 - 0.115	Evidence for Ha	Contrary to HG1
Chile - Spain	0.81	Two-sided	-0.115 - 0.115	Inconclusive	–
Chile - Venezuela	7.1	Two-sided	-0.115 - 0.115	Inconclusive	–
Haiti - Spain	0.00028	Two-sided	-0.115 - 0.115	Evidence for H0	Support for HG1
Haiti - Venezuela	5.8e-05	Two-sided	-0.115 - 0.115	Evidence for H0	Support for HG1
Spain - Venezuela	0.00025	Two-sided	-0.115 - 0.115	Evidence for H0	Support for HG1
Note: BF: Bayes Factor; criteria for assessing evidence is\(BF_{10} > 10\) or \(BF_{10} < 0.01\); comparisons between levels are shown for factors with more than two levels.

Who should get a Covid-19 vaccine first?

Data and analysis simulation

Cristóbal Moya

April 5, 2021

1 Introduction

Data Frame Summary

dt.vign

2 Analysis

2.1 Bayes factors