In these lectures we will discuss methods for causal inference, with particular attention to recent contributions from the machine learning literature. We will start with a general overview of the statistical literature on causality from the potential outcome perspective. We will review the analysis of randomised experiments, studies with unconfounded treatment assignment, instrumental variables and other methods for selection on unobservables.
Second we will review some of the machine learning literature on prediction methods. The focus in this literature is typically on methods that `work’, more than on deriving asymptotic (large sample) results of the type that are common in the econometrics and mathematical statistics literature. There is also less emphasis on confidence intervals and standard errors. On the other hand there is a heavy emphasis on out-of-sample comparisons, in particular cross-validation. Here we discuss some of the most prominent methods. These include regularised regression, where least squares parameter estimates are shrunk towards zero, with the most popular method LASSO.
We also look at regression trees and their extensions, including random forests. In the third part of the course we discuss where these two literatures meet, looking at machine methods for causal inference. This includes estimation of average treatment effects in data rich settings with many covariates and selection on observables, as well as looking at heterogenous treatment effects in settings with experimental and non-experimental data.
This masterclass is supported by the Cambridge-INET Institute