centre for microdata methods and practice

ESRC centre

cemmap is an ESRC research centre


Keep in touch

Subscribe to cemmap news

Optimal data collection for randomized control trials

Authors: Pedro Carneiro , Sokbae (Simon) Lee and Daniel Wilhelm
Date: 27 March 2017
Type: cemmap Working Paper, CWP15/17
DOI: 10.1920/wp.cem.2017.1517


In a randomized control trial, the precision of an average treatment effect estimator and the power of the corresponding t-test can be improved either by collecting data on additional individuals, or by collecting additional covariates that predict the outcome variable. We propose the use of pre-experimental data such as other similar studies, a census, or a household survey, to inform the choice of both the sample size and the covariates to be collected. Our proce-dure seeks to minimize the resulting average treatment effect estimator’s mean squared error or the corresponding t-test’s power, subject to the researcher’s budget constraint. We rely on a modiļ¬cation of an orthogonal greedy algorithm that is conceptually simple and easy to implement in the presence of a large number of potential covariates, and does not require any tuning parameters. In two empirical applications, we show that our procedure can lead to reductions of up to 58% in the costs of data collection, or improvements of the same magnitude in the precision of the treatment effect estimator.

Download full version
Previous version:
Pedro Carneiro, Sokbae (Simon) Lee and Daniel Wilhelm April 2016, Optimal data collection for randomized control trials, cemmap Working Paper, CWP15/16, Institute for Fiscal Studies

Search cemmap

Search by title, topic or name.

Contact cemmap

Centre for Microdata Methods and Practice

How to find us

Tel: +44 (0)20 7291 4800

E-mail us