Chapter 7 Chi Squared
7.1 There are two ways to run a Chi Square test. This walk-through will go through both with a couple of examples.
Say we had two cohorts, “Control” and “Experimental”
set.seed(1)
<- sample(c("Control", "Experimental"), size = 1000, replace = TRUE) Cohorts
Each cohort will also have the binary variable of sex assigned to them, as well as a categorical variable of age ranges.
set.seed(2)
<- sample(c("Male", "Female"), size = 1000, replace = TRUE)
Sex
set.seed(3)
<- sample(c("20-29", "30-39", "40-49", "50-59"), size = 1000, replace = TRUE)
Age.Range
<- as.data.frame(cbind(Cohorts, Sex, Age.Range))
Data.Frame rm(Cohorts, Age.Range, Sex)
To test of our two cohorts differ significantly from one another with regards to sex or age range, we use the chi square test of independence. The first step is to create a contingency table from our data (ie a summary table with the number of males and females in each cohort). We will save the contingency table.
table(Data.Frame$Cohorts, Data.Frame$Sex)
##
## Female Male
## Control 249 253
## Experimental 255 243
<- table(Data.Frame$Cohorts, Data.Frame$Sex) Cont.Sex
Now, we run the chi square test of independence. Note that continuity correction is typically applied when any of the numbers in the contingency table are below 10 (some say 5). Fisher’s Exact test may also be another option. However, this is a non-issue for our current sample.
chisq.test(Cont.Sex, correct = F)
##
## Pearson's Chi-squared test
##
## data: Cont.Sex
## X-squared = 0.25705, df = 1, p-value = 0.6122
We can do the same thing for the age ranges as we did for sex, even though age ranges has more than two options
<- table(Data.Frame$Cohorts, Data.Frame$Age.Range)
Cont.Age chisq.test(Cont.Age)
##
## Pearson's Chi-squared test
##
## data: Cont.Age
## X-squared = 2.2822, df = 3, p-value = 0.5159
The alternative option to the above is to give the chisq.test function two vectors and it will calculate the contingency table behind the scenes. This is obviously less code, but is less straightforward to me and typically you will want to see the contingency table anyways.
<- chisq.test(Data.Frame$Cohorts, Data.Frame$Sex, correct = F)
chi.vectors $observed chi.vectors
## Data.Frame$Sex
## Data.Frame$Cohorts Female Male
## Control 249 253
## Experimental 255 243
$expected chi.vectors
## Data.Frame$Sex
## Data.Frame$Cohorts Female Male
## Control 253.008 248.992
## Experimental 250.992 247.008
$p.value chi.vectors
## [1] 0.6121572
<- chisq.test(Cont.Sex, correct = F)
chi.df $observed chi.df
##
## Female Male
## Control 249 253
## Experimental 255 243
$expected chi.df
##
## Female Male
## Control 253.008 248.992
## Experimental 250.992 247.008
$p.value chi.df
## [1] 0.6121572
Finally, let’s say that you have the contingency table already in excel and just want to run the chi square test as quickly and easily as possible. You don’t want to manually enter the data into R, and you don’t want to deal with excels statistics functionality.
Easy! Copy the contingency table into your clipboard (just exclude the row names):
Then, run the following (I’ve commented it out because it throws an error, but trust me it works):
#Cont.From.Xl <- read.delim(pipe("pbpaste"))
#chisq.test(Cont.From.Xl, correct = F)