Overview

This section is all about preliminary screening. The purpose of the preliminary screening stage was to determine which studies to include in the main study-coding stage. There were three judges at this stage, including myself and two other doctoral students. I talk about two rounds of screening here: Round 1 and Round 2.

Round 1 (R1)

In the first round of preliminary screening, I computed agreement and kappa statistics on the three judges screening decisions before talking about any discrepancies. Code for the analysis is below.

Packages

library("psych")
library("irr")
## Loading required package: lpSolve

Data

There are two spreadsheets of data (i.e., screening decisions) you will need to read in to run the analyses: “TMSCreen.csv” and “AYScreen.csv”. “TMScreen.csv” contains my screening decisions, and “AYScreen.csv” contains the other two judges’ screening decisions. Each of the other two judges screened a percentage of the total studies included in preliminary screening.

Data1 <- read.csv("data/TMScreen.csv", sep = ",", header = T)
Data2 <- read.csv("data/AYScreen.csv", sep = ",", header = T) 

Subset decision data

Next steps are to grab just the decisions from each dataset and put them together for analysis.

AgreeData <- cbind(Data1[, 11], Data2[, 11]) # Grab just the "Decision" column
head(AgreeData)
##      [,1] [,2]
## [1,]    0    0
## [2,]    1    1
## [3,]    0    0
## [4,]    0    0
## [5,]    0    0
## [6,]    1    0

Agreement and Kappa

Now we can compute agreement and kappa statistics for the two sets of decisions. These were the values reported in the manuscript.

agree(AgreeData)
##  Percentage agreement (Tolerance=0)
## 
##  Subjects = 1090 
##    Raters = 2 
##   %-agree = 86.6
kappa2(AgreeData)
##  Cohen's Kappa for 2 Raters (Weights: unweighted)
## 
##  Subjects = 1090 
##    Raters = 2 
##     Kappa = 0.68 
## 
##         z = 24.1 
##   p-value = 0

Round 2

After seeing that we disagreed about whether to include or exclude some studies, I got together with one of the other two judges to discuss disagreements. After resolving disagreements, we re-computed agreement and kappa statistics for screening decisions.

Packages

In addition to the psych and irr packages, you will also need the tidyverse package for Round-2 analyses.

library("tidyverse")
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✔ ggplot2 3.3.2     ✔ purrr   0.3.4
## ✔ tibble  3.0.3     ✔ dplyr   1.0.2
## ✔ tidyr   1.1.2     ✔ stringr 1.4.0
## ✔ readr   1.3.1     ✔ forcats 0.5.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ ggplot2::%+%()   masks psych::%+%()
## ✖ ggplot2::alpha() masks psych::alpha()
## ✖ dplyr::filter()  masks stats::filter()
## ✖ dplyr::lag()     masks stats::lag()

Data

The only spreadsheet you need is the “ALLScreen.csv” spreadsheet. This spreadsheet contains the screening decisions from all judges in wide format.

AllData <- read.csv("data/ALLScreen.csv", sep = ",", header = T)

Get just the “NewDecision” data

This spreadsheet contains a few new columns. Two of the new columns contain the judges’ “NewDecision”, which we came to after discussing disagreements. The next step is to create a new column to indicate which studies should be included (i.e., those the judges gave a “1” for in the “NewDecision” column) and which should be excluded (i.e., those the judges gave a “0” for in the “NewDecision” column). The 1’s will be included studies, and the 0’s will be excluded studies.

AllData2 <- AllData %>%
  mutate(FinalDecision = # Create a new column to indicate studies to include (1) and exclude (0)
           if_else(AllData$NewDecision.TM == 1 & AllData$NewDecision.AY == 1, 1, 0))

Included <- AllData2 %>% filter(FinalDecision == 1) # Included studies
Excluded <- AllData2 %>% filter(FinalDecision == 0) # Excluded studies

Lists of included and excluded studies for website

These are the spreadsheets of included and excluded studies for my website.

Included2 <- Included %>% select( c(1:16, 21, 24:25))
Excluded2 <- Excluded %>% select( c(1:16, 21, 24:25))

write.csv(Included2, "Included.csv")
write.csv(Excluded2, "Excluded.csv")

Short function to calculate individual coder decisions

This function is what I used to calculate individual coder decisions

Count_Binary <- function(df){

 Data <- df[, 2:3]

 D1 <- Data %>% count(.[, 1])
 D2 <- Data %>% count(.[, 2])

 colnames(D1) <- c("Decision", "n")
 colnames(D2) <- c("Decision", "n")

 D3 <- left_join(D1, D2, by = "Decision")

 colnames(D3) <- c("Decision", "TM", "AY")

 print(D3)

}

Short function to calculate decisions for either coder(s)

This function is what I used to calculate decisions from either coder. The function yield results that include “Exclude” decisions by either coder 1 (me) or the other two coders.

Count_Either <- function(df) {

 Data <- df[, 2:3]

 Data %>%
 count(.[, 1] == 0 | .[, 2] == 0)

}

Decisions for each of the preliminary-screening criteria

This section is all about which studies were included/excluded on each preliminary screening criterion. “33” was “Not sure!”

# Language: "Was the study published in English?"

Lang <- Excluded2 %>%
 select(Short.Name, Language.TM, Language.AY, FinalDecision)

Count_Binary(Lang)
##   Decision  TM  AY
## 1        0  35  36
## 2        1 752 753
## 3       NA   2  NA
Count_Either(Lang)
##   .[, 1] == 0 | .[, 2] == 0   n
## 1                     FALSE 750
## 2                      TRUE  37
## 3                        NA   2
# Domain: "Is the study about language education?"

Domain <- Excluded2 %>%
 select(Short.Name, Domain.TM, Domain.AY, FinalDecision)

Count_Binary(Domain)
##   Decision  TM  AY
## 1        0   5   1
## 2        1 746 755
## 3       33  35  33
## 4       NA   3  NA
Count_Either(Domain)
##   .[, 1] == 0 | .[, 2] == 0   n
## 1                     FALSE 781
## 2                      TRUE   5
## 3                        NA   3
# Empirical: "Is this an empirical study?"

Empirical <- Excluded2 %>%
 select(Short.Name, Empirical.TM, Empirical.AY, FinalDecision)

Count_Binary(Empirical)
##   Decision  TM  AY
## 1        0 172 178
## 2        1 581 571
## 3       33  35  40
## 4       NA   1  NA
Count_Either(Empirical)
##   .[, 1] == 0 | .[, 2] == 0   n
## 1                     FALSE 575
## 2                      TRUE 213
## 3                        NA   1
# Use: "Was a C-test actually used?"

Use <- Excluded2 %>%
 select(Short.Name, Use.TM, Use.AY, FinalDecision)

Count_Binary(Use)
##   Decision  TM  AY
## 1        0 482 488
## 2        1 266 257
## 3       33  40  44
## 4       NA   1  NA
Count_Either(Use)
##   .[, 1] == 0 | .[, 2] == 0   n
## 1                     FALSE 260
## 2                      TRUE 529
# Correlation: "Were C-test scores correlated with something else?"

Corr <- Excluded2 %>%
 select(Short.Name, Corr.TM, Corr.AY, FinalDecision)

Count_Binary(Corr)
##   Decision  TM  AY
## 1        0 717 691
## 2        1  25  46
## 3       33  47  52
Count_Either(Corr)
##   .[, 1] == 0 | .[, 2] == 0   n
## 1                     FALSE  47
## 2                      TRUE 742

Agreement and Kappa after resolving disagreements

Here is the code to compute agreement and kappa statistics on the data after we had resolved disagreements (and studies we were not sure about) about screening decisions.

TMDec <- AllData[, c(1, 3, 22)]
AYDec <- AllData[, c(1, 3, 23)]

FinalD <- cbind(TMDec[, 3], AYDec[, 3])

agree(FinalD) # Perfect agreement! (Big surprise.)
##  Percentage agreement (Tolerance=0)
## 
##  Subjects = 1090 
##    Raters = 2 
##   %-agree = 100
kappa2(FinalD)
##  Cohen's Kappa for 2 Raters (Weights: unweighted)
## 
##  Subjects = 1090 
##    Raters = 2 
##     Kappa = 1 
## 
##         z = 33 
##   p-value = 0