Honest confidence regions for a regression parameter in logistic regression with a large number of controls

This paper considers inference in logistic regression models with high dimensional data. We propose new methods for estimating and constructing confidence regions for a regression parameter of primary interest α0 a parameter in front of the regressor of interest, such as the treatment variable or policy variable. These methods allow to estimate α0 at the root-n rate when the total number p of other regressors, called controls, exceed the sample size n, using the sparsity assumptions. The sparsity assumption means that only s unknown controls are needed to accurately approximate the nuisance part of the regression function, where s is smaller than n. Importantly, the estimators and these resulting confidence regions are ‘honest’ in the formal sense that their properties hold uniformly over s-sparse models. Moreover, these procedures do not rely on traditional ‘consistent model selection’ arguments for their validity; in fact, they are robust with respect to ‘moderate’ model selection mistakes in variable selection steps. Moreover, the estimators are semi-parametrically efficient in the sense of attaining the semi-parametric efficiency bounds for the class of models in this paper.

Working Paper

Honest confidence regions for a regression parameter in logistic regression with a large number of controls

Authors

Published Date

Type

DOI