centre for microdata methods and practice

ESRC centre

cemmap is an ESRC research centre


Keep in touch

Subscribe to cemmap news

Optimal data collection for randomized control trials

Authors: Pedro Carneiro , Sokbae (Simon) Lee and Daniel Wilhelm
Date: 23 October 2017
Type: cemmap Working Paper, CWP45/17
DOI: 10.1920/wp.cem.2017.4517


In a randomized control trial, the precision of an average treatment effect estimator and the power of the corresponding t-test can be improved either by collecting data on additional individuals, or by collecting additional covariates that predict the outcome variable. We propose the use of pre-experimental data such as other similar studies, a census, or a household survey, to inform the choice of both the sample size and the covariates to be collected. Our procedure seeks to minimize the resulting average treatment effect estimator's mean squared error and/or maximize the corresponding t-test's power, subject to the researcher's budget constraint. We rely on a modi cation of an orthogonal greedy algorithm that is conceptually simple and easy to implement in the presence of a large number of potential covariates, and does not require any tuning parameters. In two empirical applications, we show that our procedure can lead to reductions of up to 58% in the costs of data collection, or improvements of the same magnitude in the precision of the treatment effect estimator.

Download full version
New version:
Pedro Carneiro, Sokbae (Simon) Lee and Daniel Wilhelm May 2019, Optimal Data Collection for Randomized Control Trials, cemmap Working Paper, The IFS
Previous version:
Pedro Carneiro, Sokbae (Simon) Lee and Daniel Wilhelm April 2016, Optimal data collection for randomized control trials, cemmap Working Paper, Institute for Fiscal Studies

Search cemmap

Search by title, topic or name.

Contact cemmap

Centre for Microdata Methods and Practice

How to find us

Tel: +44 (0)20 7291 4800

E-mail us