Chapter 9 Additional Resources
In this chapter, we’ve got some extra resources that we think you might find interesting if you’re keen on learning more about the intersection of machine learning and behavioral science. These resources include: the slides and videos for the companion course to these tutorials, on Machine Learning and Causal Inference; publicly available datasets from randomized experiments or observational studies; a report that applies some of the methods from this tutorial to applications in behavioral science and social impact; useful software packages; and a tool that can be used to simulate simple bandit experiments, and potentially plan experiments.
9.1 Machine Learning & Causal Inference: An Introductory Course
This course by Susan Athey, Jann Spiess and Stefan Wager is the companion course to this tutorial.
The course includes a series of videos and slides that cover a quarter-long course at Stanford. The course is designed for students and researchers looking to learn more about how machine learning can be used to measure the effects of interventions, understand the heterogeneous impact of interventions, and design targeted treatment assignment policies.
The topics include:
- a high-level overview contrasting traditional econometrics with off-the-shelf machine learning
- an introduction to the topics of supervised machine learning, from the perspective of an economist
- an introduction to estimation of average treatment effects, with a focus on how machine learning methods can improve upon traditional methods for estimation
- an introduction to using machine learning to estimate conditional average treatment effects using causal trees and forests
- general principles for the design of robust, machine learning-based algorithms for treatment heterogeneity
- loss functions for causal inference
9.2 Data from Experiments
This github repository includes data from randomized experiments or observational studies, where the tools described in this tutorial may be applicable. It is constantly being updated, and please feel free to suggest new datasets that can be added!
The datasets were selected as being publicly available, having more than a few hundred observations, and including multiple covariates. Some of the datasets have instrumental variables designs, and some involve more than one treatment arm.
An important caution, however, is that very few of the datasets show consistent/replicable treatment effect heterogeneity. Possible explanations for the prevalence of null results are discussed in this report, described below in more detail.
9.4 Useful software packages
These tutorials make extensive use of the GRF package and different functionality it provides, including many types of random forests (prediction, different types of causal forests including those with unconfoundedness, instrumental variables, or survival data), tree-based policy learning, forest-based average treatment effect estimation, visualization, and techniques to assess the presence of treatment effect heterogeneity using RATE. See the GRF home page.
Other software packages are referenced throughout these tutorials.
In addition to the packages used here, there are a few other projects that have related software. For example,
EconML (python) is an open source software developed by the ALICE team at Microsoft Research.
PyWhy is a project whose mission is to build an open source ecosystem for causal ML and DoWhy (python) is the library for causal inference.
GenericML (R) implements the method proposed by Chernozhukov, Demirer, Duflo and Fernández-Val (2020) Generic Machine Learning Inference on Heterogenous Treatment Effects in Randomized Experiments.
9.5 Bandit Experiment Simulation Tool
The aim of this tool is to assist researchers to plan adaptive experiments.
Adaptive experiments vary the proportion of observations that are assigned to each treatment arm over the course of the experiment. This method is used to make experiments more effective by assigning better-performing treatments to more participants, so that the researcher learns which treatment is most effective faster and using less resources. Hopefully, the experiment identifies an arm with a higher value at the end of the experiment. Adaptive experiments can be used in pilot experiments to narrow down the number of potential treatments before deciding on what treatment arms to include in a larger non-adaptive experiment.
This tool can be used to gain intuition about how different algorithms work, as well as what can go wrong (e.g. instability if arms have similar means).
It can also be used to compare the performance of different approaches and to select tuning parameters such as the period of initial pure randomization when planning experiments.
Here are the instructions on how to use the tool:
- General Configuration - select the experiment length and the number of simulations:
- Experiment length: the length of time of the experiment
- Number of simulations: how many times the experiment was run
- Arm Configuration - select the number of treatment arms and hypothesized outcomes for each arm:
- Average success rate (between 0 and 1): binary outcome variable
- Must be at least two arms
- Algorithm Configuration - choose the data-collection and decision method:
- Collection: algorithm that determines how treatment assignment probabilities change over time
- If the main objective is to maximize outcomes during the experiment, it would be best to use a UCB or Thompson Sampling algorithm
- If the main objective is to find a good treatment at the end of the experiment, it would be best to use an Exploration Sampling or Epsilon-Greedy algorithm
- Decision: selection rule applied at the end of the experiment to choose a treatment arm
- Floor: the exponent ‘alpha’ on the assignment probability lower bound (1/K)*t^(-alpha) where K is number of treatments and t is time period
- A higher number means faster decay and more aggressive adaptivity.
- Unif. fraction: the fraction of observations assigned to non-adaptive treatment assignment
- Number of Batches: the number of times the assignment probability is changed over the course of the experiment