Chapter 12 Basic Statistics
NOTE FOR ANDREW Merge content from Module06.Rmd. I added a ton of resources about interactions there.
12.1 Preamble
This could very well be a book on its own. We are not going to go into too much detail here, but I will show a few statistics you can run for exploratory measures. Please understand that these stats should NOT be what is included in your final manuscript. They are simply a quick and dirty way to gain an understanding of your data.
12.2 Running your models
For starters I am a big fan of using something that allows me to run through several models to get an overall feel for the data. This usually involves either broom or rstatix package which allows for pipe friendly modelling. I will commonly do this in order to evaluate if random factors should be included in my final model (covered in the mixed model chapter).
# Question: Is there a subject effect in our data?
# if there is then we introduce it as a random effect.
<- df.nirs %>%
tbl.SubjectEffect group_by(metric) %>%
do(tidy(lm(value ~ Subject, .))) # requires broom
<- df.nirs %>%
tbl.SubjectEffect group_by(metric) %>%
::anova_test(value~Subject)
rstatix# answer is yes. There is a subject effect that we must accomodate
For a very complete guide on running ANOVAs in R for your first time. This post is immaculate.
You can see some exercises here
12.3 Modeling your data
To test the normality of your data you can use a few different methods
- http://www.sthda.com/english/wiki/normality-test-in-r
- https://rstudio-pubs-static.s3.amazonaws.com/2002_1f803b2bc84c46008d3599a07867a95a.html
- https://www.datanovia.com/en/lessons/anova-in-r/
- Plot your data
- Check skewness and kurtosis
- Shapiro test.
In order to visualize your linear model I use 3 packages/BlandAltmanLeh/vignettes/Intro
- jtools
- interactions
- ggeffects (used less)
- To visualize
lmer
mixed models see here - To follow up a mixed-model here
If you want a very quick way to look at your data, using rstatix is my #1 recommendation.
ggstatsplot is a nice package that gets around a few problems you may run into.
The broom
package for visualising models
12.4 Statistics - Resources
12.4.1 Correlations in R/random_slopes
http://www.sthda.com/english/wiki/visualize-correlation-matrix-using-correlogram
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5383908/
https://cran.r-project.org/web/packages/sjPlot/vignettes/sjtitemanalysis.html
12.4.2 Correlation Plots:
http://www.sthda.com/english/wiki/visualize-correlation-matrix-using-correlogram
12.4.4 Learn about Residuals:
http://docs.statwing.com/interpreting-residual-plots-to-improve-your-regression/
https://statisticsbyjim.com/regression/check-residual-plots-regression-analysis/
https://rpubs.com/iabrady/residual-analysis
http://www.learnbymarketing.com/tutorials/linear-regression-in-r/
12.4.5 Regression Diagnostics
https://www.statmethods.net/stats/rdiagnostics.html
tidymodels : https://www.tidymodels.org/
12.4.6 Bland-Altman Plots
https://cran.r-project.org/web/packages/BlandAltmanLeh/vignettes/Intro.html
12.4.8 Mixed Models
Good introduction: https://ourcodingclub.github.io/tutorials/mixed-models/ More intro: https://www.jaredknowles.com/journal/2013/11/25/getting-started-with-mixed-effect-models-in-r https://dynamicecology.wordpress.com/2015/02/05/how-many-terms-in-your-model-before-statistical-machismo/ https://m-clark.github.io/mixed-models-with-R/random_slopes.html Package that gives pseudo-R^2 for mixed models –> https://cran.r-project.org/web/packages/jtools/vignettes/summ.html#report_robust_standard_errors https://rinterested.github.io/statistics/mixed_effects_comparison.html
12.4.9 Cluster Analysis (see bottom of the page for more links)
https://www.datanovia.com/en/product/practical-guide-to-cluster-analysis-in-r/
12.5 Looking at interactions
There are a few packages I like to use to visualize interactions. In no particular order they are:
- jtools
- interactions
- sjPlot
- ggeffects
- modelbased (vignette here)
- ggstatsplot
#Example from my microstates project see scripts/tbl-testing.R
::plot_model(model4, type = "pred", terms = c("Age", "Sex"), show.data = TRUE) +
sjPlottheme(legend.position = "top") +
labs(title = "Predicted values of MeanOccurrence (What the model says)",
y = "Mean Microstate Occurrence")
I am going to be using the interactions
package here but later in the document I show a few other examples. There are 2 main functions
cat_plot
for categorical predictorsinteract_plot
when you have continuous predictors
12.5.1 Packages
mertool is a gui for modelling –> https://www.jaredknowles.com/journal/2015/8/12/announcing-mertools tidymodels: https://www.tidymodels.org/learn/statistics/tidy-analysis/
To assess model performance I use the performance package. You can visualize your model using this vignette
# Running an ANOVA ----------------
# testing to see if the type of service has an effect on the Journey Time
#model <- lm(journey_time_avg ~ service, data=df)
<- aov(journey_time_avg ~ service, data=df)
model <- summary(model)
modsum #anova(model) #show me the answer in an ANOVA table
modsum
# ANOVA w/ Type III SS --------------
# requires the "car" library
<- Anova(lm(Measure ~ Group + Brain_Region + Group*Brain_Region, data = df, contrasts=list(Group=contr.sum, Brain_Region=contr.sum), type=3))
model report(model) #requires easystats
describe.aov(model, 'Group:Brain_Region', sstype = 2) #requires apastats
options(contrasts = c("contr.sum", "contr.poly"))
Anova(model, type="III") # Type III tests
# Post-Hoc according to Field's Book ---------
contrasts(df$Group)=contr.poly(3)
#model.1=aov(dv~covariate+factorvariable, data=dataname)
.1=aov(Measure ~ Baseline + Group + Brain_Region + Group*Brain_Region, data = df)
modelAnova(model.1, type="III")
# Make sure you use capital "A" Anova here and not anova. This will give results using type III SS.
=glht(model.1, linfct=mcp(Group="Tukey")) ##gives the post-hoc Tukey analysis
posthocsummary(posthoc) ##shows the output in a nice format.
confint(posthoc) # Give me the confidence intervals for the post-hoc tests
effect("Group", model.1) # Give me the group effects
Below is another loop
# Run Several models using sapply --------------------
# IVs and covariates
<- c("Neurological_Infection")
ivs <- c("Etiology_of_Injury", "Open_Closed_Injury", "SDH", "EDH", "Intracranial_monitors", "Non_neurologic_surgery", "Skull_fracture", "ICU_LOS_in_days", "Time_to_surgery", "Site_of_surgery", "Craniotomy", "Craniotomy_OR_duration", "Craniectomy", "Craniectomy_OR_duration", "Bone_removal_surgeries", "Initial_Cranioplasty", "Days_to_initial_cranioplasty", "Initial_Cranioplasty_OR_Duration", "Initial_Cranioplasty_bone_material", "Bone_replacement_surgeries", "Duraplasty_material", "Other_Infection", "Non_periop_antibiotics", "Time_of_day_of_initial_surgery", "Week_weekend_of_initial_surgery", "Hydrocephalus", "Post_op_complications", "EtOH_substance_abuse", "Anticoagulation", "Seizure", "HTN", "Smoker", "Diabetes", "Heart_disease", "Liver_disease", "Kidney_disease", "Previous_brain_injury_surgery")
covariates
<- sapply(covariates,
univ_formulas function(x) as.formula(paste('Neurological_Infection~', x)))
<- lapply( univ_formulas, function(x){glm(x, ,family=binomial(link='logit'),data=df1)})
univ_models # Making a loop to run several stats and saving them as a table --------
# I have an example of this in JumpCut projects
= levels(erp$Condition)
cond = levels(erp$ERP_component)
comp $NumUndiagConc <- as.factor(erp$NumUndiagConc)
erp$NumPriorConc <- as.factor(erp$NumPriorConc)
erp
<- data.frame()
res for (yy in (1:(length(cond)))) {
<- subset(erp, Condition == cond[yy])
data if ((cond[yy]=="rGoC") | (cond[yy]=="rNgW")){
<- c("ERN_Amp","ERN_Lat","Pe_Amp","Pe_Lat")
comp else if ((cond[yy]!="rGoC") | (cond[yy]!="rNgW")){
} = c( "N2_Amp", "N2_Lat", "P3a_Amp","P3a_Lat", "P3b_Amp", "P3b_Lat")
comp
}<- data.frame() #reset after each condition loop
res1 for (zz in (1:(length(comp)))) {
<- subset(data,ERP_component == comp[zz])
compData = compData$electNum
electNum if (length(unique(electNum))==1){
<- lmer(ERP_value~AthleteType + Time + AthleteType*Time + (1|id) ,data=compData)
fm1 else if (length(unique(electNum))>=1){
} <- lmer(ERP_value~AthleteType + NumPriorConc+ AthleteType*Time +AthleteType*NumPriorConc*Time + (1|id) + (1|electNum),data=compData)
fm1
}assign(paste("mod", cond[yy],gsub("_","",comp[zz]), sep="_"), fm1) #save it in the form of modsum_GoC_N2Amp
assign(paste("modsum", cond[yy],gsub("_","",comp[zz]), sep="_"), summary(fm1)) #save it in the form of modsum_GoC_N2Amp
<- as.data.frame(summary(fm1)$coefficients)
tmod <- round(tmod[-c(1), ],3)
tmod <- subset(tmod,`Pr(>|t|)`<= 0.1)
tmod if (nrow(tmod)==0){
<- 1
a else if (nrow(tmod)>=0){
} <- rownames(tmod)
Test <- cbind(Test,tmod)
tmod rownames(tmod) <- NULL
#Component <- c(rep_len(paste(cond[yy],' ',comp[zz]),length.out = length(Test)))#comp[zz]
<- c(rep_len(paste(comp[zz]),length.out = length(Test)))#comp[zz]
Component <- cbind(Component,tmod)
tmod <- rbind(res,tmod)
res $`Pr(>|t|)`[ tmod$`Pr(>|t|)` <0.001] <- "<0.001"
tmod$`Pr(>|t|)` <- sub("[0].", ".", tmod$`Pr(>|t|)`)
tmod#tmod$`t value` <- round(tmod$`t value`,digits=2)
$Result <- paste0("t(", round(tmod$df,0), ")=", sprintf("%.2f", tmod$`t value`), ", p=", tmod$`Pr(>|t|)`)
tmod<- subset(tmod,select=c("Component", "Test","Result"))
tmod $Test <- gsub("AthleteType.L","Athlete Type",tmod$Test)
tmod$Test <- gsub(":","$\\times$", tmod$Test, fixed=TRUE)
tmod$Component <- gsub("_"," ",tmod$Component)
tmod$Test <- gsub("Time","Time ",tmod$Test)
tmod<- rbind(res1,tmod)
res1
}assign(paste0(cond[yy],"statsTable"), res1) #save it in the form of modsum_GoC_N2Amp
} }