Working Paper

Optimal data collection for randomized control trials


Pedro Carneiro, Sokbae (Simon) Lee, Daniel Wilhelm

Published Date

1 April 2016


Working Paper (CWP15/16)

In a randomized control trial, the precision of an average treatment e ffect estimator can be improved either by collecting data on additional individuals, or by collecting additional covariates that predict the outcome variable. We propose the use of pre-experimental data such as a census, or a household survey, to inform the choice of both the sample size and the covariates to be collected. Our procedure seeks to minimize the resulting average treatment e ect estimator’s mean squared error, subject to the researcher’s budget constraint. We rely on an orthogonal greedy algorithm that is conceptually simple, easy to implement (even when the number of potential covariates is very large), and does not require any tuning parameters. In two empirical applications, we show that our procedure can lead to substantial gains of up to 58%, either in terms of reductions in data collection costs or in terms of improvements in the precision of the treatment eff ect estimator, respectively.

The original version of the working paper, posted on 01 April, 2016, is available here.

Latest version

Optimal data collection for randomized control trials
Daniel Wilhelm, Pedro Carneiro, Sokbae (Simon) Lee