We develop a theory for estimation of a high-dimensional sparse parameter defined as a minimizer of a population loss function LD( ,g0) which, in addition to , depends on a, potentially infinite dimensional, nuisance parameter g0. Our approach is based on estimating via an l1-regularized minimization of a sample analog of Ls( ,ĝ), plugging in a first-stage estimate ĝ, computed on a hold-out sample. We define a population loss to be (Neyman) orthogonal if the gradient of the loss with respect to , has pathwise derivative with respect to g equal to zero, when evaluated at the true parameter and nuisance component. We show that orthogonality implies a second-order impact of the first stage nuisance error on the second stage target parameter estimate. Our approach applies to both convex and non-convex losses, albeit the latter case requires a small adaptation of our method with a preliminary estimation step of the target parameter. Our result enables oracle convergence rates for under assumptions on the first stage rates, typically of the order of n1/4.
We show how such an orthogonal loss can be constructed via a novel orthogonalization process for a general model defined by conditional moment restrictions. We apply our theory to high-dimensional versions of standard estimation problems in statistics and econometrics, such as: estimation of conditional moment models with missing data, estimation of structural utilities in games of incomplete information and estimation of treatment effects in regression models with non-linear link functions.