[1] 0.6324573
Bayesian design and prespecified interim analyses
A Bayesian trial design was proposed that allows for prespecified interim analyses to take place in addition to leveraging pilot study data. For SPYRAL HTN-OFF MED Pivotal, assuming 15% attrition, interim analyses are expected to occur first term at approximately 210 and second-term when approximately 240 subjects have evaluable data, with a maximum sample size of 300 evaluable subjects if the trial does not stop at either the first or second interim look.
The time from randomization of the first cohort to the second cohort and final cohort, if applicable, is lengthy due to stringent eligibility criteria and subsequent slow randomization rates.
For SPYRAL HTN-ON MED Expansion, the expected sample sizes are 149 and 187 evaluable subjects, with a maximum sample size of 221 evaluable subjects. Actual numbers of evaluable subjects will be determined by the actual attrition, which is currently expected to be 15%. At each prespecified interim analysis, enrollment may be stopped for efficacy or expected futility.
For SPYRAL HTN-OFF MED Pivotal, the primary and secondary efficacy endpoints will be evaluated during these prespecified interim looks, and enrollment will only stop at an interim analysis if both endpoints meet the following stopping criteria. For SPYRAL HTN-ON MED Expansion, the primary efficacy endpoint will be evaluated at prespecified interim looks. A distinction between the stopping for efficacy and futility is that the former is based solely on the observed/evaluable data, whereas the latter involves the observed/evaluable data, as well as imputation for subjects without evaluable data or not yet randomized.
The underlying model is a Bayesian analogue of the analysis of covariance (ANCOVA) model for SBP change from baseline, š¦š, adjusted for baseline blood pressure and treatment arm. Due to a Bayesian power prior approach being used, a non-standard parameterization for the ANCOVA model is used to allow for informative prior distributions to be placed separately on the RDN and control arm effects:
Hereās a breakdown of the Bayesian clinical trial methodology described in the article, along with guidance on how to build an R simulation to demonstrate it.
Model
y_i = mu_t * I(i, ā, t) + mu_c * I(i, ā, c) + x_i * beta + epsilon_i
y_i
: Change from baseline in blood pressure at follow-up for subject āiā.I(i, ā, t)
: Indicator whether subject āiā belongs to the treatment group (1 if yes, 0 if no).I(i, ā, c)
: Indicator whether subject āiā belongs to the control group.mu_t
: Baseline-adjusted treatment effect for the renal denervation group.mu_c
: Baseline-adjusted treatment effect for the sham control group.x_i
: Mean-centered baseline blood pressure for subject āiā.beta
: Regression coefficient adjusting for baseline blood pressure.epsilon_i
: Error term, assumed to be normally distributed.alpha_t
and alpha_c
control how heavily the pilot data is weighted. These can be fixed or estimated dynamically.beta
, flat prior for log(sigma)
.This simulation will reflect the Bayesian ANCOVA model structure you provided. Weāll simulate treatment effect analysis incorporating both baseline and treatment effects, reflecting your method of adjusting for mean-centered baseline blood pressure.
[1] 0.6324573
Type I Error Assessment
Discount Function
R Simulation: Key Steps
mu_t
, mu_c
, beta
, and log(sigma)
.rstanarm
or brms
).mu < 0
.Let me know if youād like specific R code examples for any of these steps. Building this simulation can be quite involved, so itās best to break it down into parts.
The posterior distribution for \(\mu)\), after observing data with a sample mean \((\bar{x}\)\) and sample variance \((s^2)\) from \((n)\) observations, remains normal: \[ \mu \mid \text{data} \sim N\left(\frac{\frac{\bar{x}}{s^2/n} + \frac{\mu_0}{\tau^2}}{\frac{1}{s^2/n} + \frac{1}{\tau^2}}, \frac{1}{\frac{1}{s^2/n} + \frac{1}{\tau^2}}\right)\]
One-armed bdp normal
data:
Current treatment: mu_t = -3.6, sigma_t = 18.9466468512361, N_t = 204
Historical treatment: mu0_t = -4.9, sigma0_t = -19.9906104312602, N0_t = 70
Stochastic comparison (p_hat) - treatment (current vs. historical data): 0.4575
Discount function value (alpha) - treatment: 0.5352
95 percent CI:
-6.1963 -1.272
posterior sample estimate:
mean of treatment group
-3.7796
Two-armed bdp normal
data:
Current treatment: mu_t = -4.6, sigma_t = 9.49966570358137, N_t = 112
Current control: mu_c = -0.8, sigma_c = 7.53739117484396, N_c = 95
Historical treatment: mu0_t = -5.3, sigma0_t = 9.76153164211437, N0_t = 35
Historical control: mu0_c = -0.74, sigma0_c = 9.72, N0_c = 36
Stochastic comparison (p_hat) - treatment (current vs. historical data): 0.4663
Stochastic comparison (p_hat) - control (current vs. historical data): 0.4961
Discount function value (alpha) - treatment: 0.5557
Discount function value (alpha) - control: 0.6235
alternative hypothesis: two.sided
95 percent CI:
-6.0515 -1.7134
posterior sample estimates:
treatment group control group
-4.69 -0.79
In a Bayesian adaptive trial, the āpower parameterā is not related to statistical power. Instead, itās a term used to describe how much the data from a prior study (like a pilot study) should influence the analysis of the current trial. Itās a measure of āborrowing strengthā from the prior data.
If the pilot and current pivotal data are very similar, you would use more of the pilot data in your analysis (power parameter closer to 1).
If they are quite different, you would use less of the pilot data (power parameter closer to 0).
Adaptive allocation ruleāchange in the randomization procedure to modify the allocation proportion.
(2)
Adaptive sampling ruleāchange in the number of study subjects (sample size) or change in study population: entry criteria for the patients change.
(3)
Adaptive stopping ruleāduring the course of the trial, a data-dependent rule dictates whether and when to stop for harm/futility/efficacy.
(4)
Adaptive enrichment: during the trial, treatments are added or dropped.
The significance level in a clinical trial is the threshold at which the results of the trial are deemed statistically significant. It represents the probability of rejecting the null hypothesis when it is actually true (a Type I error, or false positive). In conventional frequentist statistics, common significance levels are 0.05 or 0.01. For the SPYRAL HTN-OFF MED Pivotal trial, a one-sided Type I error rate of 2.9% is specified, which means they use a significance level of 0.029 for their primary efficacy endpoint.
In the Bayesian context, the significance level can be used to determine the decision thresholds for interim analyses. For instance, a posterior probability below a certain significance level may indicate futility, or if it is sufficiently high above 1 minus the significance level, it may indicate efficacy.
1. **Power Prior**: In a Bayesian framework, prior information (like results from pilot studies) is typically incorporated into the analysis through the prior distribution. A power prior is a method to adjust the influence of this prior information on the current analysis.
2. **Similarity Statistic**: Before the pilot data is fully incorporated into the pivotal trial analysis, a measure of similarity between the pilot and pivotal data is calculated. This helps to understand how comparable the two datasets are.
3. **Power Parameter**: The power parameter (denoted typically by Ī± in Bayesian statistics) adjusts how much weight is given to the pilot data in the analysis. If the pilot and pivotal data are very similar, the power parameter is closer to 1, meaning that more information from the pilot study is retained. If the data are dissimilar, the power parameter is closer to 0, meaning the pilot data is down-weighted.
4. **Transformation using Weibull Function**: The similarity statistic is transformed into a power parameter using a Weibull function. The shape and scale of this function determine how the similarity statistic is translated into the power parameter. This transformation is based on predefined parameters that are determined through simulations to optimize the trial's design.
The purpose of this is to dynamically borrow information from the pilot study based on the level of consistency with the ongoing pivotal trial data. This process allows the analysis to be adaptive and more informative by integrating the amount of evidence that is considered appropriate from the prior study into the current analysis.
In summary, the significance level in the context of the SPYRAL trials is used as a decision-making threshold during interim analyses, while the similarity statistic and its transformation into a power parameter determine the degree to which prior data informs the current analysis.
[1] 0.753125
[1] 0
Power Estimation (94% to detect -4.0 mm Hg difference): This means that when they simulated 8,000 trials assuming the true effect is -4.0 mm Hg, in 94% of those simulations, they were able to reject the null hypothesis of no effect (or an effect of 0 mm Hg) at the 2.9% significance level. Type I Error Estimation (2.9% for the primary endpoint): This is the probability of rejecting the null hypothesis when it is true, and it was estimated using 15,000 simulations under the null hypothesis of no effect. Only in 2.9% of those simulations should they have rejected the null hypothesis, given that there was actually no difference.
They assumed a true treatment effect and simulated data accordingly.
They used a statistical method (likely a Bayesian approach, given the context) to analyze the simulated data. They checked against a significance level (2.9% one-sided Type I error rate) to see if the null hypothesis could be rejected. They repeated this process across many simulations and calculated what percentage of those simulations resulted in rejecting the null hypothesis. If itās around 94%, thatās the power estimate.
In both cases, each simulation is binary in outcome: either the null is rejected (counted as a success for power, or as a false positive for Type I error) or it is not. The final reported percentages (power of 94%, Type I error of 2.9%) are the proportions of these binary outcomes across all simulations.
sample_size power
1 210 0.772375
2 240 0.827000
3 300 0.882125
[1] 0.564875