centre for microdata methods and practice

ESRC centre

cemmap is an ESRC research centre

ESRC

Keep in touch

Subscribe to cemmap news

Program evaluation with high-dimensional data

Authors: Alexandre Belloni , Victor Chernozhukov , Iván Fernández-Val and Christian Hansen
Date: 31 December 2013
Type: cemmap Working Papers, CWP77/13
doi: 10.1920/wp.cem.2013.7713

Abstract

In the first part of the paper, we consider estimation and inference on policy relevant treatment effects, such as local average and local quantile treatment effects, in a data-rich environment where there may be many more control variables available than there are observations. In addition to allowing many control variables, the setting we consider allows endogenous receipt of treatment, heterogeneous treatment effects, and function-valued outcomes. To make informative inference possible, we assume that some reduced form predictive relationships are approximately sparse. That is, we require that the relationship between the control variables and the outcome, treatment status, and instrument status can be captured up to a small approximation error using a small number of the control variables whose identities are unknown to the researcher. This condition allows estimation and inference for a wide variety of treatment parameters to proceed after selection of an appropriate set of controls formed by selecting control variables separately for each reduced form relationship and then appropriately combining these reduced form relationships. We provide conditions under which post-selection inference is uniformly valid across a wide-range of models and show that a key condition underlying the uniform validity of post-selection inference allowing for imperfect model selection is the use of approximately unbiased estimating equations. We illustrate the use of the proposed methods with an application to estimating the effect of 401(k) participation on accumulated assets.

In the second part of the paper, we present a generalization of the treatment effect framework to a much richer setting, where possibly a continuum of target parameters is of interest and the Lasso-type or post-Lasso type methods are used to estimate a continuum of high-dimensional nuisance functions. This framework encompasses the analysis of local treatment effects as a leading special case and also covers a wide variety of classical and modern moment-condition problems in econometrics. We establish a functional central limit theorem for the continuum of the target parameters, and also show that it holds uniformly in a wide range of data-generating processes P, with continua of approximately sparse nuisance functions. We also establish validity of the multiplier bootstrap for resampling the first order approximations to the standardized continuum of the estimators, and also establish uniform validity in P. We propose a notion of the functional delta method for finding limit distribution and multiplier bootstrap of the smooth functionals of the target parameters that is valid uniformly in P. Finally, we establish rate and consistency results for continua of Lasso or post-Lasso type methods for estimating continua of the (nuisance) regression functions, also providing practical, theoretically justified penalty choices. Each of these results is new and could be of independent interest.

Download full version

New version: Alexandre Belloni , Victor Chernozhukov , Iván Fernández-Val and Christian Hansen ,August 2014,  Program evaluation with high-dimensional data,  cemmap Working Papers ,CWP33/14 ,Cemmap
Previous version: Alexandre Belloni , Victor Chernozhukov , Iván Fernández-Val and Christian Hansen ,November 2013,  Program evaluation with high-dimensional data,  cemmap Working Papers ,CWP57/13 ,Cemmap

Publications feeds

Subscribe to cemmap working papers via RSS

Search cemmap

Search by title, topic or name.

Contact cemmap

Centre for Microdata Methods and Practice

How to find us

Tel: +44 (0)20 7291 4800

E-mail us