Mon analyse du dataset trial

Importation

trial  <- read.csv2(here("data/trial.csv"))

Description

Characteristic Overall, N = 2001 Drug A, N = 981 Drug B, N = 1021 p-value2
age 47.24 (14.31) 47.01 (14.71) 47.45 (14.01) 0.8
missing values 11 7 4
marker 0.92 (0.86) 1.02 (0.89) 0.82 (0.83) 0.12
missing values 10 6 4
stage
T1 26.50% 28.57% 24.51%
T2 27.00% 25.51% 28.43%
T3 21.50% 22.45% 20.59%
T4 25.00% 23.47% 26.47%
grade
I 34.00% 35.71% 32.35%
II 34.00% 32.65% 35.29%
III 32.00% 31.63% 32.35%
response 31.61% 29.47% 33.67% 0.5
missing values 7 3 4
death 56.00% 53.06% 58.82% 0.4

1 Mean (SD); %

2 Two Sample t-test

Modélisation

Analyses uni et multivariées

##                Estimate Std. Error    z value   Pr(>|z|)
## (Intercept) -1.70335908  0.7132940 -2.3880184 0.01693950
## age          0.01911857  0.0118930  1.6075474 0.10793433
## marker       0.32829134  0.1956681  1.6777969 0.09338676
## stageT2     -0.78271069  0.4691961 -1.6681953 0.09527696
## stageT3     -0.13355845  0.4822316 -0.2769592 0.78181145
## stageT4     -0.42915078  0.4729451 -0.9074008 0.36419491
## gradeII      0.04267335  0.4273544  0.0998547 0.92045968
## gradeIII     0.05046867  0.4073378  0.1238988 0.90139538
Dependent: response 0 1 OR (univariable) OR (multivariable)
age Mean (SD) 45.9 (14.4) 49.8 (14.2) 1.02 (1.00-1.04, p=0.095) 1.02 (1.00-1.04, p=0.108)
marker Mean (SD) 0.8 (0.8) 1.1 (0.9) 1.35 (0.94-1.93, p=0.100) 1.39 (0.94-2.05, p=0.093)
stage T1 34 (65.4) 18 (34.6) - -
T2 39 (75.0) 13 (25.0) 0.63 (0.27-1.46, p=0.285) 0.46 (0.18-1.13, p=0.095)
T3 25 (62.5) 15 (37.5) 1.13 (0.48-2.68, p=0.775) 0.87 (0.34-2.25, p=0.782)
T4 34 (69.4) 15 (30.6) 0.83 (0.36-1.92, p=0.668) 0.65 (0.25-1.63, p=0.364)
grade I 46 (68.7) 21 (31.3) - -
II 44 (69.8) 19 (30.2) 0.95 (0.45-2.00, p=0.884) 1.04 (0.45-2.42, p=0.920)
III 42 (66.7) 21 (33.3) 1.10 (0.52-2.29, p=0.808) 1.05 (0.47-2.35, p=0.901)
Number in dataframe = 200, Number in model = 173, Missing = 27, AIC = 222.6, C-statistic = 0.648, H&L = Chi-sq(8) 5.13 (p=0.743)

Modele final

Characteristic log(OR)1 95% CI1 p-value
age 0.02 0.00, 0.04 0.11
marker 0.28 -0.09, 0.64 0.14

1 OR = Odds Ratio, CI = Confidence Interval

L’équation du modèle final est :

\[ \begin{aligned} \log\left[ \frac { \widehat{P( \operatorname{response} = \operatorname{1} )} }{ 1 - \widehat{P( \operatorname{response} = \operatorname{1} )} } \right] &= -1.95 + 0.02(\operatorname{age}) + 0.28(\operatorname{marker}) \end{aligned} \]

Les résultats peuvent être visualisés ci dessous:

Resultats

## We fitted a logistic model (estimated using ML) to predict response with age and marker (formula: response ~ age + marker). The model's explanatory power is weak (Tjur's R2 = 0.03). The model's intercept, corresponding to age = 0 and marker = 0, is at -1.95 (95% CI [-3.22, -0.77], p = 0.002). Within this model:
## 
##   - The effect of age is statistically non-significant and positive (beta = 0.02, 95% CI [-3.86e-03, 0.04], p = 0.109; Std. beta = 0.27, 95% CI [-0.06, 0.62])
##   - The effect of marker is statistically non-significant and positive (beta = 0.28, 95% CI [-0.09, 0.64], p = 0.138; Std. beta = 0.24, 95% CI [-0.08, 0.56])
## 
## Standardized parameters were obtained by fitting the model on a standardized version of the dataset. 95% Confidence Intervals (CIs) and p-values were computed using