We develop uniformly valid conﬁdence regions for regression coefficients in a high-dimensional sparse least absolute deviation/median regression model. The setting is one where the number of regressors p could be large in comparison to the sample size n, but only s ≪ n of them are needed to accurately describe the regression function. Our new methods are based on the instrumental median regression estimator that assembles the optimal estimating equation from the output of the post ℓ1-penalized median regression and post ℓ1-penalized least squares in an auxiliary equation. The estimating equation is immunized against non-regular estimation of nuisance part of the median regression function, in the sense of Neyman. We establish that in a homoscedastic regression model, the instrumental median regression estimator of a single regression coefficient is asymptotically root-n normal uniformly with respect to the underlying sparse model. The resulting conﬁdence regions are valid uniformly with respect to the underlying model. We illustrate the value of uniformity with Monte-Carlo experiments which demonstrate that standard/naive post-selection inference breaks down over large parts of the parameter space, and the proposed method does not. We then generalize our method to the case where p1 ≫ n regression coefficients are of interest in a non-smooth Z-estimation framework with approximately sparse nuisance functions, containing median regression with a single target regression coefficient as a very special case. We construct simultaneous conﬁdence bands on all p1 coefficients, and establish their uniform validity over the underlying approximately sparse model.
Uniform post selection inference for LAD regression and other z-estimation problems
30 December 2013
Working Paper (CWP74/13)