By Alyssa Simon and Mark LaCour
Calculations of standard errors, confidence intervals, and hypothesis testing are all made under certain assumptions. For instance, in many statistical tests, it is assumed that your data are normally distributed. When you violate assumptions like these, your analyses are inaccurate. Your p-values aren’t what they should be and/or don’t mean what they’re intended to mean.
Bootstrapping is a statistical method that can help to get around these issues.
Bootstrapping works by taking the data from your study and re-sampling it many times to create many simulated samples. These simulated samples consist of random data points pulled from the original sample to create a new sample. Each data point in the simulated samples is pulled from the original sample any number of times as long as the simulated samples have the same amount of data points as the original sample.
Why do such a thing? Well, remember from Chapter 1, where it was pointed out that sample means don’t reflect the real central tendency of a distribution if that distribution is skewed? What if you’re comparing two sample means and they’re both skewed? If only we could do a statistical test on the medians (of differences between medians) instead of the means. Normally, this would just be wishful thinking. To do statistical tests you need to create sampling distributions, and those are hard to do with medians.
But that’s the beautiful thing about bootstrapping: You create your own sampling distribution! (Sort of…)
Let’s say you want a 95% confidence interval around a median rather than a mean because your data are skewed and, thus, the mean is a bad summary of your data.
For instance, say you have a distribution of reaction times like the one pictured below:
A picture of a histogram
There are 26 observations in this data. To help make this example clearer, I’ve named each of them after a letter in the alphabet and given them more or less a different color.
“The data”
Each of these observations/people have a reaction time that falls within one of the bins, ‘0 - 99’, ‘100 - 199’, etc. When you create a simulated sample, you randomly draw 1 of the data points from the real data, then put the point you drew back, replacing it, so that you can potentially draw it again. You want a simulated sample that’s the same size as your original sample. Here, that number is 26. While all 26 of your real data points are unique, not copied. Your simulated data sets could potentially have duplicates.
Take the example below:
The original sample of 26 unique data points is on top. My first simulated sample is below. I drew one of the original 26 at random. Since I’m sampling with replacement I can potentially randomly sample the same original data points multiple times. Indeed, this happens in my simulated sample. The “W” was drawn twice. The “M” was drawn four times. It doesn’t matter. We just get a simulated sample of 26 data points and don’t worry about duplicates.
If we create a histogram of this simulated data, we get the following:
If you compare this simulated data set to the original, you can see some similarities. One of the peaks is roughly in the same place, though there’s a new peak in the middle. There’s also still a positive skew.
The fundamental idea behind bootstrapping is to create thousands and thousands of these random simulated data sets and, for each one, calculate a certain statistic. Since it’s hard to come up with a mathematically derived sampling distribution for median, we can use bootstrapping to come up with a simulated sampling distribution of the median…
We know that the mean doesn’t do a good job of representing the central tendency of a positively skewed distribution. The skew in the distribution drags the mean away from where it should be. For instance, in the original data up above, the “average” of all the reaction times was 566.62 and the median was 445.50. Thus, if we wanted to calculate a 95% confidence interval around our mean, we’d be calculating a confidence interval for a statistic that’s not that useful.
Under normal circumstances, you can’t calculate a 95% confidence interval around a sample median. With bootstrapping, though, you sort of can. I went to my favorite stats program, R, and I had it draw 1000 bootstrapped samples from the reaction time above. For each simulated bootstrap sample, I calculated the mean of that particular sample and the median for that particular sample. Check out the distributions of simulated means and medians below.
picture
The simulated sampling distribution of sample means on top is normally distributed. It’s centered at 568.98 and 95% of all the simulated means fall between 430.48 and 730.03. This simulated sampling distribution is also normally distributed (i.e., symmetrical). This is what you would expect from the central limit theorem.
The simulated sampling distribution of medians on the bottom is quite different. It’s centered at 445.5 and 95% of the simulated medians fall between 333 and 607. So, as you can hopefully see, making inferences about our sample mean and our sample median using either traditional confidence intervals or bootstrapped intervals creates diverging pictures of what our data look like.
In general, bootstrapping is useful any time you are in a situation where the assumptions of your statistical test are violated, if you want to use statistics that don’t have traditional means for calculating confidence intervals. Writing the code for this simple example was pretty easy. But it wouldn’t be too much harder to scale it up to slightly more complicated tests.
For instance, what if you have two skewed distributions? You want to know if one of them has a peak (i.e., central tendency) that is probably higher than the other in the general population. Normally, you would do an independent samples t-test. However, the t-test assumes that both population distributions are normally distributed. No worries! With bootstrapping, you can create independent SIMULATED samples from your skewed distributions and record all of the differences in medians between all of these simulated samples.
The term “bootstrapping” comes from the phrase “pulling yourself up by your own bootstraps”.
This phrase was used in the naming to describe how the original sample can make a way for the other samples to be created through resampling the original, which seemed to be impossible!
Bootstrapping is almost too good to be true, but using this method can bring about meaningful results.
Still not getting it? Try watching this video
A picture of a boot and some text telling you to get pulling!