This section is all about preliminary screening. The purpose of the preliminary screening stage was to determine which studies to include in the main study-coding stage. There were three judges at this stage, including myself and two other doctoral students. I talk about two rounds of screening here: Round 1 and Round 2.
In the first round of preliminary screening, I computed agreement and kappa statistics on the three judges screening decisions before talking about any discrepancies. Code for the analysis is below.
library("psych")
library("irr")
## Loading required package: lpSolve
There are two spreadsheets of data (i.e., screening decisions) you will need to read in to run the analyses: “TMSCreen.csv” and “AYScreen.csv”. “TMScreen.csv” contains my screening decisions, and “AYScreen.csv” contains the other two judges’ screening decisions. Each of the other two judges screened a percentage of the total studies included in preliminary screening.
Data1 <- read.csv("data/TMScreen.csv", sep = ",", header = T)
Data2 <- read.csv("data/AYScreen.csv", sep = ",", header = T)
Next steps are to grab just the decisions from each dataset and put them together for analysis.
AgreeData <- cbind(Data1[, 11], Data2[, 11]) # Grab just the "Decision" column
head(AgreeData)
## [,1] [,2]
## [1,] 0 0
## [2,] 1 1
## [3,] 0 0
## [4,] 0 0
## [5,] 0 0
## [6,] 1 0
Now we can compute agreement and kappa statistics for the two sets of decisions. These were the values reported in the manuscript.
agree(AgreeData)
## Percentage agreement (Tolerance=0)
##
## Subjects = 1090
## Raters = 2
## %-agree = 86.6
kappa2(AgreeData)
## Cohen's Kappa for 2 Raters (Weights: unweighted)
##
## Subjects = 1090
## Raters = 2
## Kappa = 0.68
##
## z = 24.1
## p-value = 0
After seeing that we disagreed about whether to include or exclude some studies, I got together with one of the other two judges to discuss disagreements. After resolving disagreements, we re-computed agreement and kappa statistics for screening decisions.
In addition to the psych and irr packages, you will also need the tidyverse package for Round-2 analyses.
library("tidyverse")
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✔ ggplot2 3.3.2 ✔ purrr 0.3.4
## ✔ tibble 3.0.3 ✔ dplyr 1.0.2
## ✔ tidyr 1.1.2 ✔ stringr 1.4.0
## ✔ readr 1.3.1 ✔ forcats 0.5.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ ggplot2::%+%() masks psych::%+%()
## ✖ ggplot2::alpha() masks psych::alpha()
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
The only spreadsheet you need is the “ALLScreen.csv” spreadsheet. This spreadsheet contains the screening decisions from all judges in wide format.
AllData <- read.csv("data/ALLScreen.csv", sep = ",", header = T)
This spreadsheet contains a few new columns. Two of the new columns contain the judges’ “NewDecision”, which we came to after discussing disagreements. The next step is to create a new column to indicate which studies should be included (i.e., those the judges gave a “1” for in the “NewDecision” column) and which should be excluded (i.e., those the judges gave a “0” for in the “NewDecision” column). The 1’s will be included studies, and the 0’s will be excluded studies.
AllData2 <- AllData %>%
mutate(FinalDecision = # Create a new column to indicate studies to include (1) and exclude (0)
if_else(AllData$NewDecision.TM == 1 & AllData$NewDecision.AY == 1, 1, 0))
Included <- AllData2 %>% filter(FinalDecision == 1) # Included studies
Excluded <- AllData2 %>% filter(FinalDecision == 0) # Excluded studies
These are the spreadsheets of included and excluded studies for my website.
Included2 <- Included %>% select( c(1:16, 21, 24:25))
Excluded2 <- Excluded %>% select( c(1:16, 21, 24:25))
write.csv(Included2, "Included.csv")
write.csv(Excluded2, "Excluded.csv")
This function is what I used to calculate individual coder decisions
Count_Binary <- function(df){
Data <- df[, 2:3]
D1 <- Data %>% count(.[, 1])
D2 <- Data %>% count(.[, 2])
colnames(D1) <- c("Decision", "n")
colnames(D2) <- c("Decision", "n")
D3 <- left_join(D1, D2, by = "Decision")
colnames(D3) <- c("Decision", "TM", "AY")
print(D3)
}
This function is what I used to calculate decisions from either coder. The function yield results that include “Exclude” decisions by either coder 1 (me) or the other two coders.
Count_Either <- function(df) {
Data <- df[, 2:3]
Data %>%
count(.[, 1] == 0 | .[, 2] == 0)
}
This section is all about which studies were included/excluded on each preliminary screening criterion. “33” was “Not sure!”
# Language: "Was the study published in English?"
Lang <- Excluded2 %>%
select(Short.Name, Language.TM, Language.AY, FinalDecision)
Count_Binary(Lang)
## Decision TM AY
## 1 0 35 36
## 2 1 752 753
## 3 NA 2 NA
Count_Either(Lang)
## .[, 1] == 0 | .[, 2] == 0 n
## 1 FALSE 750
## 2 TRUE 37
## 3 NA 2
# Domain: "Is the study about language education?"
Domain <- Excluded2 %>%
select(Short.Name, Domain.TM, Domain.AY, FinalDecision)
Count_Binary(Domain)
## Decision TM AY
## 1 0 5 1
## 2 1 746 755
## 3 33 35 33
## 4 NA 3 NA
Count_Either(Domain)
## .[, 1] == 0 | .[, 2] == 0 n
## 1 FALSE 781
## 2 TRUE 5
## 3 NA 3
# Empirical: "Is this an empirical study?"
Empirical <- Excluded2 %>%
select(Short.Name, Empirical.TM, Empirical.AY, FinalDecision)
Count_Binary(Empirical)
## Decision TM AY
## 1 0 172 178
## 2 1 581 571
## 3 33 35 40
## 4 NA 1 NA
Count_Either(Empirical)
## .[, 1] == 0 | .[, 2] == 0 n
## 1 FALSE 575
## 2 TRUE 213
## 3 NA 1
# Use: "Was a C-test actually used?"
Use <- Excluded2 %>%
select(Short.Name, Use.TM, Use.AY, FinalDecision)
Count_Binary(Use)
## Decision TM AY
## 1 0 482 488
## 2 1 266 257
## 3 33 40 44
## 4 NA 1 NA
Count_Either(Use)
## .[, 1] == 0 | .[, 2] == 0 n
## 1 FALSE 260
## 2 TRUE 529
# Correlation: "Were C-test scores correlated with something else?"
Corr <- Excluded2 %>%
select(Short.Name, Corr.TM, Corr.AY, FinalDecision)
Count_Binary(Corr)
## Decision TM AY
## 1 0 717 691
## 2 1 25 46
## 3 33 47 52
Count_Either(Corr)
## .[, 1] == 0 | .[, 2] == 0 n
## 1 FALSE 47
## 2 TRUE 742
Here is the code to compute agreement and kappa statistics on the data after we had resolved disagreements (and studies we were not sure about) about screening decisions.
TMDec <- AllData[, c(1, 3, 22)]
AYDec <- AllData[, c(1, 3, 23)]
FinalD <- cbind(TMDec[, 3], AYDec[, 3])
agree(FinalD) # Perfect agreement! (Big surprise.)
## Percentage agreement (Tolerance=0)
##
## Subjects = 1090
## Raters = 2
## %-agree = 100
kappa2(FinalD)
## Cohen's Kappa for 2 Raters (Weights: unweighted)
##
## Subjects = 1090
## Raters = 2
## Kappa = 1
##
## z = 33
## p-value = 0