First, we establish that the K-ML estimator is consistent and asymptotically normal for any K. This complements ndings in Aguirregabiria and Mira (2007), who focus on K = 1 and K large enough to induce convergence of the estimator. Furthermore, we show that the asymptotic variance of the K-ML estimator can exhibit arbitrary patterns as a function K.

Second, we establish that the K-MD estimator is consistent and asymptotically normal for any K. For a specific weight matrix, the K-MD estimator has the same asymptotic distribution as the K-ML estimator. Our main result provides an optimal sequence of weight matrices for the K-MD estimator and shows that the optimally weighted K-MD estimator has an asymptotic distribution that is invariant to K. This new result is especially unexpected given the findings in Aguirregabiria and Mira (2007) for K-ML estimators. Our main result implies two new and important corollaries about the optimal 1-MD estimator (derived by Pesendorfer and Schmidt-Dengler (2008)). First, the optimal 1-MD estimator is optimal in the class of K-MD estimators for all K. In other words, additional policy iterations do not provide asymptotic efficiency gains relative to the optimal 1-MD estimator. Second, the optimal 1-MD estimator is more or equally asymptotically efficient than any K-ML estimator for all K.]]>

The approach relies on moment conditions that have an additional orthogonal property with respect to nuisance parameters. Moreover, estimation of high-dimension nuisance parameters is carried out via new pivotal procedures. In order to achieve simultaneously valid confidence regions we use a multiplier bootstrap procedure to compute critical values and establish its validity.]]>

We propose two distinct QGMs. First, Condition Independence Quantile Graphical Models (CIQGMs) characterize conditional independence at each quantile index revealing the distributional dependence structure. Second, Prediction Quantile Graphical Models (PQGMs) characterize the best linear predictor under asymmetric loss functions. A key difference between those models is the (non-vanishing) misspecication between the best linear predictor and the conditional quantile functions.

We also propose estimators for those QGMs. Due to high-dimensionality, the two distinct QGMs require different estimators. The estimators are based on high-dimensional techniques including (a continuum of) L1-penalized quantile regressions (and low biased equations), which allow us to handle the potential large number of variables. We build upon a recent literature to obtain new results for valid choice of the penalty parameters, rates of convergence, and condence regions that are simultaneously valid.

We illustrate how to use QGMs to quantify tail interdependence (instead of mean dependence) between a large set of variables which is relevant in applications concerning with extreme events. We show that the associated tail risk network can be used for measuring systemic risk contributions. We also apply the framework to study international financial contagion and the impact of market downside movement on the dependence structure of assets' returns.

The extreme points of the calibrated projection confidence interval are obtained by extremizing the value of the component (or function) of interest subject to a proper relaxation of studentized sample analogs of the moment (in)equality conditions. The degree of relaxation, or critical level, is calibrated so that the component (or function) of , not itself, is uniformly asymptotically covered with prespecied probability. This calibration is based on repeatedly checking feasibility of linear programming problems, rendering it computationally attractive.

Nonetheless, the program defining an extreme point of the confidence interval is generally nonlinear and potentially intricate. We provide an algorithm, based on the response surface method for global optimization, that approximates the solution rapidly and accurately. The algorithm is of independent interest for inference on optimal values of stochastic nonlinear programs. We establish its convergence under conditions satisfied by canonical examples in the moment (in)equalities literature.

Our assumptions and those used in the leading alternative approach (a profiling based method) are not nested. An extensive Monte Carlo analysis conrms the accuracy of the solution algorithm and the good statistical as well as computational performance of calibrated projection, including in comparison to other methods.]]>

We provide a small Monte Carlo experiment to study the estimators' finite sample properties and an application to the estimation of gasoline demand functions.]]>

It is shown that whenever g is Lipschitz, though not necessarily differentiable, the posterior distribution of g(theta) and the bootstrap distribution of theta_n coincide asymptotically. One implication is that Bayesians can interpret bootstrap inference for g(theta) as approximately valid posterior inference in a large sample. Another implication---built on known results about bootstrap inconsistency---is that credible sets for a nondifferentiable parameter g(theta) cannot be presumed to be approximately valid confidence sets (even when this relation holds true for theta).]]>

may improve life expectancy, but also impose serious short term risks; reducing class sizes may improve performance of good students, but not help weaker ones or vice versa. Quantile regression methods can help to explore these heterogeneous effects. Some recent developments in quantile regression methods are surveyed below.]]>

]]>

One of the main objectives of empirical analysis of experiments and quasi-experiments is to inform policy decisions that determine the

allocation of treatments to individuals with different observable covariates. We study the properties and implementation of the Empirical Welfare Maximization (EWM) method, which estimates a treatment assignment policy by maximizing the sample analog of average social welfare over a class of candidate treatment policies. The EWM approach is attractive in terms of both statistical performance and practical implementation in realistic settings of policy design. Common features of these settings include: (i) feasible treatment assignment rules are constrained exogenously for ethical, legislative, or political reasons, (ii) a policy maker wants a simple treatment assignment rule based on one or more eligibility scores in order to reduce the dimensionality of individual observable characteristics, and/or (iii) the proportion of individuals who can receive the treatment is a priori limited due to a budget or a capacity constraint. We show that when the propensity score is known, the average social welfare attained by EWM rules converges at least at n^(-1/2) rate to the maximum obtainable welfare uniformly over a minimally constrained class of data distributions, and this uniform convergence rate is minimax optimal. We examine how the uniform convergence rate depends on the richness of the class of candidate decision rules, the distribution of conditional treatment effects, and the lack of knowledge of the propensity score. We offer easily implementable algorithms for computing the EWM rule and an application using experimental data from the National JTPA Study.

]]>]]>

This April 2017 version is an updated version of the January 2017 version. The original version of the working paper is available here.

]]>only under dense graph sequences. ]]>

]]>

an incomplete model of English auctions, improving on the pointwise bounds available till now. Application of many of the results of the paper requires no familiarity with random set theory.

]]>

We then apply this result to derive a Gaussian multiplier boot-strap procedure for constructing honest conﬁdence bands for non-parametric density estimators (this result can be applied in other nonparametric problems as well). An essential advantage of our ap-proach is that it applies generically even in those cases where the limit distribution of the supremum of the studentized empirical pro-cess does not exist (or is unknown). This is of particular importance in problems where resolution levels or other tuning parameters have been chosen in a data-driven fashion, which is needed for adaptive constructions of the conﬁdence bands. Finally, of independent inter-est is our introduction of a new, practical version of Lepski’s method, which computes the optimal, non-conservative resolution levels via a Gaussian multiplier bootstrap method.

]]>We apply our method to analyze the distributional impact of insurance coverage on health care utilization and to provide a distributional decomposition of the racial test score gap. Our analysis generates new interesting findings, and complements previous analyses that focused on mean effects only. In both applications, the outcomes of interest are discrete rendering standard inference methods invalid for obtaining uniform confidence bands for quantile and quantile effects functions.

]]>moment conditions continue to hold when one first step component is incorrect. Locally robust moment conditions also have smaller bias that is flatter as a function of first step smoothing leading to improved small sample properties. Series first step estimators confer local robustness on any moment conditions and are doubly robust for affine moments, in the direction of the series approximation. Many new locally and doubly robust estimators are given here, including for economic structural models. We give simple asymptotic theory for estimators that use cross-fitting in the first step, including machine learning.]]>

]]>

]]>

The original version of the working paper, posted on 01 April, 2016, is available here.

]]>Moreover, the data suggest that the wife and the husband retire at the same time for a nonnegligible fraction of couples. Our approach takes as a starting point a stylized economic model that leads to a univariate generalized accelerated failure time model. The covariates of that generalized accelerated failure time model act as utility-flow shifters in the economic model. We introduce simultaneity by allowing the utility flow in retirement to depend on the retirement status of the spouse. The econometric model is then completed by assuming that the observed outcome is the Nash bargaining solution in that simple economic model. The advantage of this approach is that it includes independent realizations from the generalized accelerated failure time model as a special case, and deviations from this special case can be given an economic interpretation. We illustrate the model by studying the joint retirement decisions in married couples using the Health and Retirement Study. We provide a discussion of relevant identifying variation and estimate our model using indirect inference. The main empirical nding is that the simultaneity seems economically important. In our preferred specication the indirect utility associated with being retired increases by approximately 5% when one's spouse retires. The estimated model also predicts that the marginal effect of a change in the husbands' pension plan on wives' retirement dates is about 3.3% of the direct effect on the husbands'.

]]>The critical level is by construction smaller (in finite sample) than the one used if projecting confience regions designed to cover the entire parameter vector. Hence, our confidence interval is weakly shorter than the projection of established confidence sets (Andrews and Soares, 2010), if one holds the choice of tuning parameters constant. We provide simple conditions under which the comparison is strict. Our inference method controls asymptotic coverage uniformly over a large class of data-generating processes. Our assumptions and those used in the leading alternative approach (a profiling-based method) are not nested. We explain why we employ some restrictions that are not required by other methods and provide examples of models for which our method is uniformly valid but profiling-based methods are not.

]]>satisfy standard norm bounds, and (3) functions with unbounded domains. In all three cases we provide two kinds of results, compact embedding and closedness, which together allow one to show that parameter spaces defined by a ||·||

Using diﬀerential geometry and functional delta methods, we establish that the estimated sorted eﬀects are consistent for the true sorted eﬀects, and derive asymptotic normality and bootstrap approximation results, enabling construction of pointwise conﬁdence bands (point-wise with respect to percentile indices). We also derive functional central limit theorems and bootstrap approximation results, enabling construction of simultaneous conﬁdence bands (simultaneous with respect to percentile indices). The derived statistical results in turn rely on establishing Hadamard diﬀerentiability of the multivariate sorting operator, a result of independent mathematical interest.

]]>Leading important special cases encompassed by the framework we study include: (i) Tests of shape restrictions for infinite dimensional parameters; (ii) Confidence regions for functionals that impose shape restrictions on the underlying parameter; (iii) Inference for functionals in semiparametric and nonparametric models defined by conditional moment (in)equalities; and (iv) Uniform inference in possibly nonlinear and severely ill-posed problems.

]]>Supplementary material for this paper is available here.

]]>Supplementary material for this paper is available here.

]]>We also analyze the properties of fixed effects estimators of functions of the data, parameters and individual and time effects including average partial effects. Here, we uncover that the incidental parameter bias is asymptotically of second order, because the rate of the convergence of the fixed effects estimators is slower for average partial effects than for model parameters. The bias corrections are still effective to improve finite-sample properties.

View the supplementary document for this paper here.

]]>For the case of discretely-valued covariates we present analog estimators and characterize their large sample properties. When the number of time periods (*T*) exceeds the number of random coefficients (*P*), identification is regular, and our estimates are *√N* - consistent. When *T* = *P*, our identification results make special use of the subpopulation of stayers - units whose regressor values change little over time - in a way which builds on the approach of Graham and Powell (2012). In this just-identified case we study asymptotic sequences which allow the frequency of stayers in the population to shrink with the sample size. One purpose of these “discrete bandwidth asymptotics” is to approximate settings where covariates are continuously-valued and, as such, there is only an infinitesimal fraction of exact stayers, while keeping the convenience of an analysis based on discrete covariates. When the mass of stayers shrinks with *N*, identification is irregular and our estimates converge at a slower than *√N* rate, but continue to have limiting normal distributions.

We apply our methods to study the effects of collective bargaining coverage on earnings using the National Longitudinal Survey of Youth 1979 (NLSY79). Consistent with prior work (e.g., Chamberlain, 1982; Vella and Verbeek, 1998), we find that using panel data to control for unobserved worker heterogeneity results in sharply lower estimates of union wage premia. We estimate a median union wage premium of about 9 percent, but with, in a more novel finding, substantial heterogeneity across workers. The 0.1 quantile of union effects is insignificantly different from zero, whereas the 0.9 quantile effect is of over 30 percent. Our empirical analysis further suggests that, on net, unions have an equalizing effect on the distribution of wages.

Supporting material is available in a supplementary appendix here.

]]>Technical supporting material is available in a supplementary appendix here.

]]>

Supplementary material for this paper is available here.

]]>Supplementary material for this paper is available here.

]]>

We also analyze the properties of fixed effects estimators of functions of the data, parameters and individual and time effects including average partial effects. Here, we uncover that the incidental parameter bias is asymptotically of second order, because the rate of the convergence of the fixed effects estimators is slower for average partial effects than for model parameters. The bias corrections are still useful to improve finite-sample properties.

]]>The commands clrbound, clr2bound, and clr3bound provide bound estimates that can be used directly for estimation or to construct asymptotically valid conﬁdence sets. clrtest performs an intersection bound test of the hypothesis that a collection of lower intersection bounds is no greater than zero. The command clrbound provides bound estimates for one-sided lower or upper intersection bounds on a parameter, while clr2bound and clr3bound provide two-sided bound estimates based on both lower and upper intersection bounds. clr2bound uses Bonferroni’s inequality to construct two-sided bounds that can be used to perform asymptotically valid inference on the identiﬁed set or the parameter of interest, whereas clr3bound provides a generally tighter conﬁdence interval for the parameter by inverting the hypothesis test performed by clrtest. More broadly, inversion of this test can also be used to construct conﬁdence sets based on conditional moment inequalities as described in Chernozhukov et al. (2013). The commands include parametric, series, and local linear estimation procedures, and can be installed from within STATA by typing “ssc install clrbound”.

]]>the fi rst stage and then the preference parameters in the second stage based on Manski (1975, 1985)s maximum score estimator using the choice data and first stage estimates. This setting can be extended to maximum score estimation with nonparametrically generated regressors. The paper establishes consistency and derives rate of convergence of the two-stage maximum score estimator. Moreover, the paper also provides sufficient conditions under which the two-stage estimator is asymptotically equivalent in distribution to the corresponding single-stage estimator that assumes the first stage input is known. The paper also presents some Monte Carlo simulation results for finite-sample behavior of the two-stage estimator.]]>

We propose two new specification tests, denoted Tests RS and RC, that achieve uniform asymptotic size control and dominate Test BP in terms of power in any finite sample and in the asymptotic limit. Test RC is particularly convenient to implement because it requires little additional work beyond the confidence set construction. Test RS requires a separate procedure to compute, but has the best power. The separate procedure is computationally easier than confidence set construction in typical cases.

]]>In the second part of the paper, we present a generalization of the treatment effect framework to a much richer setting, where possibly a continuum of target parameters is of interest and the Lasso-type or post-Lasso type methods are used to estimate a continuum of high-dimensional nuisance functions. This framework encompasses the analysis of local treatment effects as a leading special case and also covers a wide variety of classical and modern moment-condition problems in econometrics. We establish a functional central limit theorem for the continuum of the target parameters, and also show that it holds uniformly in a wide range of data-generating processes *P*, with continua of approximately sparse nuisance functions. We also establish validity of the multiplier bootstrap for resampling the first order approximations to the standardized continuum of the estimators, and also establish uniform validity in *P*. We propose a notion of the functional delta method for finding limit distribution and multiplier bootstrap of the smooth functionals of the target parameters that is valid uniformly in *P*. Finally, we establish rate and consistency results for continua of Lasso or post-Lasso type methods for estimating continua of the (nuisance) regression functions, also providing practical, theoretically justified penalty choices. Each of these results is new and could be of independent interest.

These technical tools allow us to contribute to the series literature, specifically the seminal work of Newey (1997), as follows. First, we weaken considerably the condition on the number k of approximating functions used in series estimation from the typical k2/n → 0 to k/n → 0, up to log factors, which was available only for spline and local polynomial partition series before. Second, under the same weak conditions we derive L2 rates and pointwise central limit theorems results when the approximation error vanishes. Under an incorrectly specified model, i.e. when the approximation error does not vanish, analogous results are also shown. Third, under stronger conditions we derive uniform rates and functional central limit theorems that hold if the approximation error vanishes or not. That is, we derive the strong approximation for the entire estimate of the nonparametric function. Finally, we derive uniform rates and inference results for linear functionals of interest of the conditional expectation function such as its partial derivative or conditional average partial derivative.

]]>We then apply this result to derive a Gaussian multiplier bootstrap procedure for constructing honest confidence bands for nonparametric density estimators (this result can be applied in other nonparametetric problems as well). An essential advantage of our approach is that it applies generically even in those cases where the limit distribution of the supremum of the studentized empirical process does not exist (or is unknown). This is of particular importance in problems where resolution levels or other tuning parameters have been chosen in a data-driven fashion, which is needed for adaptive constructions of the confidence bands. Furthermore, our approach is asymptotically honest at a polynomial rate - namely, the error in coverage level converges to zero at a fast, polynomial speed (with respect to the sample size). In sharp contrast, the approach based on extreme value theory is asymptotically honest only at a logarithmic rate - the error converges to zero at a slow, logarithmic speed. Finally, of independent interest is our introduction of a new, practical version of Lepski's method, which computes the optimal, non-conservative resolution levels via a Gaussian multiplier bootstrap method.]]>
https://www.cemmap.ac.uk/publications/7031
*p* of other regressors, called controls, exceed the sample size *n*, using the sparsity assumptions. The sparsity assumption means that only *s* unknown controls are needed to accurately approximate the nuisance part of the regression function, where *s* is smaller than *n*. Importantly, the estimators and these resulting confidence regions are 'honest' in the formal sense that their properties hold uniformly over *s*-sparse models. Moreover, these procedures do not rely on traditional 'consistent model selection' arguments for their validity; in fact, they are robust with respect to 'moderate' model selection mistakes in variable selection steps. Moreover, the estimators are semi-parametrically efficient in the sense of attaining the semi-parametric efficiency bounds for the class of models in this paper.]]>*p*, is possibly much larger than the sample size *n*. There are a variety of economic applications where the problem of testing many moment inequalities appears; a notable example is the entry model of Ciliberto and Tamer (2009) where *p*=2*m*+1 with *m* being the number of firms. We consider the test statistics given by the maximum of *p* Studentized (or *t*-type) statistics, and analyze various ways to compute critical values for the test. Specifically, we consider critical values based upon (i) the union (Bonferroni) bound combined with a moderate deviation inequality for self-normalized sums, (ii) the multiplier bootstrap. We also consider two step variants of (i) and (ii) by incorporating moment selection. We prove validity of these methods, showing that under mild conditions, they lead to tests with error in size decreasing polynomially in *n* while allowing for *p* being much larger than *n*; indeed *p* can be of order exp(*nc*) for some *c*>0. Importantly, all these results hold without any restriction on correlation structure between *p* Studentized statistics, and also hold uniformly with respect to suitably wide classes of underlying distributions. We also show that all the tests developed in this paper are asymptotically minimax optimal when *p* grows with *n*.

A supplement to this paper can be downloaded here.

]]>*(A typo on page 27 that erroneously resulted in the OLS estimator instead of the 2SLS estimator was corrected in July 2015).*

Supplementary material for this paper is available here.

]]>Supplementary material relating to this working paper can be viewed here

]]>This paper is forthcoming in the The Journal of Multivariate Analysis

]]>An online appendix to accompany this publication is available here

]]>The main attractive feature of our method is that it allows for imperfect selection of the controls and provides confidence intervals that are valid uniformly across a large class of models. In contrast, standard post-model selection estimators fail to provide uniform inference even in simple cases with a small, fixed number of controls. Thus our method resolves the problem of uniform inference after model selection for a large, interesting class of models. We also present a simple generalisation of our method to a fully heterogeneous model with a binary treatment variable. We illustrate the use of the developed methods with numerical simulations and an application that considers the effect of abortion crime rates.

]]>A supplement to this article, which outlines theoretical properties underpinning the methodology and provides a proof of theorem, can be viewed here

]]>This article is accompanied by a web appendix in which we present omitted discussions, an algorithm to implement the proposed method for the sharp RSS and proofs for the main results.

]]>As part of developing the main results, we introduce distribution regression as a comprehensive and flexible tool for modelling and estimating the *entire* conditional distribution. We show that distribution regression encompasses the Cox duration regression and represents a useful alternative to quantile regression. We establish functional central limit theorems and bootstrap validity results for the empirical distribution regression process and various related functionals.

This is a revision of CWP05/12 and CWP09/09

]]>2SLS has the advantage of providing an easy to compute point estimator of a slope coefficient which can be interpreted as a local average treatment effect (LATE). However, the 2SLS estimator does not measure the value of other useful treatment effect parameters without invoking untenable restrictions.

The nonparametric instrumental variable (IV) model has the advantage of being weakly restrictive, so more generally applicable, but it usually delivers set identification. Nonetheless it can be used to consistently estimate bounds on many parameters of interest including, for example, average treatment effects. We illustrate using data from Angrist & Evans (1998) and study the effect of family size on female employment.

This October 2015 version corrects an error in the paper, as explained in footnote 1. The original version of the working paper is available here.

]]>We propose two hypothesis tests that use the infimum of the sample criterion function over the parameter space as the test statistic together with two different critical values. We obtain two main results. First, we show that the two tests we propose are asymptotically size correct in a uniform sense. Second we show our tests are more powerful than the test that checks whether the confidence set for the parameters of interest is empty or not.

]]>This paper is a revised version of CWP13/09.

]]>The main attractive feature of our method is that it allows for imperfect selection of the controls and provides confidence intervals that are valid uniformly across a large class of models. In contrast, standard post-model selection estimators fail to provide uniform inference even in simple cases with a small, fixed number of controls. Thus our method resolves the problem of uniform inference after model selection for a large, interesting class of models. We illustrate the use of the developed methods with numerical simulations and an application to the effect of abortion on crime rates.

This paper is a revision of CWP42/11.

]]>This is a revision of CWP09/09.

]]>Various methods have been used to overcome the point identification problem inherent in the linear age-period-cohort model. This paper presents a set-identification result for the model and then considers the use of the maximum-entropy principle as a vehicle for achieving point identification. We present two substantive applications (US female mortality data and UK female labor force participation) and compare the results from our approach to some of the solutions in the literature.

]]>We show that the model delivers set identification of the latent utility functions and we characterize sharp bounds on those functions. We develop easy-to-compute outer regions which in parametric models require little more calculation than what is involved in a conventional maximum likelihood analysis. The results are illustrated using a model which is essentially the parametric conditional logit model of McFadden (1974) but with potentially endogenous explanatory variables and instrumental variable restrictions. The method employed has wide applicability and for the first time brings instrumental variable methods to bear on structural models in which there are multiple unobservables in a structural equation.

]]>The paper studies the partial identifying power of structural single equation threshold crossing models for binary responses when explanatory variables may be endogenous. The paper derives the sharp identified set of threshold functions for the case in which explanatory variables are discrete and provides a constructive proof of sharpness. There is special attention to a widely employed semiparametric shape restriction which requires the threshold crossing function to be a monotone function of a linear index involving the observable explanatory variables. It is shown that the restriction brings great computational benefits, allowing direct calculation of the identified set of index coefficients without calculating the nonparametrically specified threshold function. With the restriction in place the methods of the paper can be applied to produce identified sets in a class of binary response models with mis-measured explanatory variables.

This is a further revised version (Oct 7th 2011) of CWP23/09 "Single equation endogenous binary response models"

]]>Quantile regression (QR) is a principal regression method for analyzing the impact of covariates on outcomes. The impact is described by the conditional quantile function and its functionals. In this paper we develop the nonparametric QR series framework, covering many regressors as a special case, for performing inference on the entire conditional quantile function and its linear functionals. In this framework, we approximate the entire conditional quantile function by a linear combination of series terms with quantile-specific coefficients and estimate the function-valued coefficients from the data. We develop large sample theory for the empirical QR coefficient process, namely we obtain uniform strong approximations to the empirical QR coefficient process by conditionally pivotal and Gaussian processes, as well as by gradient and weighted bootstrap processes.

We apply these results to obtain estimation and inference methods for linear functionals of the conditional quantile function, such as the conditional quantile function itself, its partial derivatives, average partial derivatives, and conditional average partial derivatives. Specifically, we obtain uniform rates of convergence, large sample distributions, and inference methods based on strong pivotal and Gaussian approximations and on gradient and weighted bootstraps. All of the above results are for function-valued parameters, holding uniformly in both the quantile index and in the covariate value, and covering the pointwise case as a by-product. If the function of interest is monotone, we show how to use monotonization procedures to improve estimation and inference. We demonstrate the practical utility of these results with an empirical example, where we estimate the price elasticity function of the individual demand for gasoline, as indexed by the individual unobserved propensity for gasoline consumption.

]]>We examine the "home bias" of knowledge spillovers (the idea that knowledge spreads more slowly over international boundaries than within them) as measured by the speed of patent citations. We present econometric evidence that the geographical localization of knowledge spillovers has fallen over time, as we would expect from the dramatic fall in communication and travel costs. Our proposed estimator controls for correlated fixed effects and censoring in duration models and we apply it to data on over two million patent citations between 1975 and 1999. Home bias is exaggerated in models that do not control for fixed effects. The fall in home bias over time is weaker for the pharmaceuticals and information/communication technology sectors where agglomeration externalities may remain strong.

]]>We show that the model delivers set, not point, identification of the latent utility functions and we characterize sharp bounds on those functions. We develop easy-to-compute outer regions which in parametric models require little more calculation than what is involved in a conventional maximum likelihood analysis. The results are illustrated using a model which is essentially the parametric conditional logit model of McFadden (1974) but with potentially endogenous explanatory variables and instrumental variable restrictions.

The method employed has wide applicability and for the first time brings instrumental variable methods to bear on structural models in which there are multiple unobservables in a structural equation.

This paper has now been revised and the new version is available as CWP39/11.

]]>This paper is a revised version of cemmap working paper CWP33/07.

]]>I present an application to the study of segregation in school friendship networks, using data from Add Health containing the actual social networks of students in a representative sample of US schools. My results suggest that for white students, the value of a same-race friend decreases with the fraction of whites in the school. The opposite is true for African American students.

The model is used to study how different desegregation policies may affect the structure of the network in equilibrium. I find an inverted u-shaped relationship between the fraction of students belonging to a racial group and the expected equilibrium segregation levels. These results suggest that desegregation programs may decrease the degree of interracial interaction within schools.

]]>Optimal instruments are conditional expectations; and in developing the IV results, we also establish a series of new results for LASSO and Post-LASSO estimators of non-parametric conditional expectation functions which are of independent theoretical and practical interest. Specifically, we develop the asymptotic theory for these estimators that allows for non-Gaussian, heteroscedastic disturbances, which is important for econometric applications. By innovatively using moderate deviation theory for self-normalized sums, we provide convergence rates for these estimators that are as sharp as in the homoscedastic Gaussian case under the weak condition that log p = o(n ^{1/3}). Moreover, as a practical innovation, we provide a fully data-driven method for choosing the user-specified penalty that must be provided in obtaining LASSO and Post-LASSO estimates and establish its asymptotic validity under non-Gaussian, heteroscedastic disturbances.

We provide a tractable characterization of the sharp identification region of the parameters θ in a broad class of incomplete econometric models. Models in this class have set valued predictions that yield a convex set of conditional or unconditional moments for the observable model variables. In short, we call these models with convex moment predictions. Examples include static, simultaneous move finite games of complete and incomplete information in the presence of multiple equilibria; best linear predictors with interval outcome and covariate data; and random utility models of multinomial choice in the presence of interval regressors data. Given a candidate value for θ, we establish that the convex set of moments yielded by the model predictions can be represented as the Aumann expectation of a properly defined random set. The sharp identification region of θ, denoted Θ_{1}, can then be obtained as the set of minimizers of the distance from a properly specified vector of moments of random variables to this Aumann expectation. Algorithms in convex programming can be exploited to efficiently verify whether a candidate θ is in Θ_{1}. We use examples analyzed in the literature to illustrate the gains in identification and computational tractability afforded by our method.

This paper is a revised version of CWP27/09.

]]>We illustrate the approach using scanner data on food purchases to estimate bounds on willingness to pay for the organic characteristic. We combine these estimates with information on households' stated preferences and beliefs to show that on average quality is the most important factor affecting bounds on household willingness to pay for organic, with health concerns coming second, and environmental concerns lagging far behind.

]]>Social experiments are powerful sources of information about the effectiveness of interventions. In practice, initial randomization plans are almost always compromised. Multiple hypotheses are frequently tested. "Significant" effects are often reported with p-values that do not account for preliminary screening from a large candidate pool of possible effects. This paper develops tools for analyzing data from experiments as they are actually implemented.

We apply these tools to analyze the influential HighScope Perry Preschool Program. The Perry program was a social experiment that provided preschool education and home visits to disadvantaged children during their preschool years. It was evaluated by the method of random assignment. Both treatments and controls have been followed from age 3 through age 40.

Previous analyses of the Perry data assume that the planned randomization protocol was implemented. In fact, as in many social experiments, the intended randomization protocol was compromised. Accounting for compromised randomization, multiple-hypothesis testing, and small sample sizes, we find statistically significant and economically important program effects for both males and females. We also examine the representativeness of the Perry study.

]]>This paper is a revised version of CWP18/09.

]]>In this paper we study post-penalized estimators which apply ordinary, unpenalized linear regression to the model selected by first-step penalized estimators, typically LASSO. It is well known that LASSO can estimate the regression function at nearly the oracle rate, and is thus hard to improve upon. We show that post-LASSO performs at least as well as LASSO in terms of the rate of convergence, and has the advantage of a smaller bias. Remarkably, this performance occurs even if the LASSO-based model selection 'fails' in the sense of missing some components of the 'true' regression model. By the 'true' model we mean here the best s-dimensional approximation to the regression function chosen by the oracle. Furthermore, post-LASSO can perform strictly better than LASSO, in the sense of a strictly faster rate of convergence, if the LASSO-based model selection correctly includes all components of the 'true' model as a subset and also achieves a sufficient sparsity. In the extreme case, when LASSO perfectly selects the 'true' model, the post-LASSO estimator becomes the oracle estimator. An important ingredient in our analysis is a new sparsity bound on the dimension of the model selected by LASSO which guarantees that this dimension is at most of the same order as the dimension of the 'true' model. Our rate results are non-asymptotic and hold in both parametric and nonparametric models. Moreover, our analysis is not limited to the LASSO estimator in the first step, but also applies to other estimators, for example, the trimmed LASSO, Dantzig selector, or any other estimator with good rates and good sparsity. Our analysis covers both traditional trimming and a new practical, completely data-driven trimming scheme that induces maximal sparsity subject to maintaining a certain goodness-of-fit. The latter scheme has theoretical guarantees similar to those of LASSO or post-LASSO, but it dominates these procedures as well as traditional trimming in a wide variety of experiments.

]]>This paper develops a formal language for study of treatment response with social interactions, and uses it to obtain new findings on identification of potential outcome distributions. Defining a person's treatment response to be a function of the entire vector of treatments received by the population, I study identification when shape restrictions and distributional assumptions are placed on response functions. An early key result is that the traditional assumption of individualistic treatment response (ITR) is a polar case within the broad class of constant treatment response (CTR) assumptions, the other pole being unrestricted interactions. Important non-polar cases are interactions within reference groups and distributional interactions. I show that established findings on identification under assumption ITR extend to assumption CTR. These include identification with assumption CTR alone and when this shape restriction is strengthened to semi-monotone response. I next study distributional assumptions using instrumental variables. Findings obtained previously under assumption ITR extend when assumptions of statistical independence (SI) are posed in settings with social interactions. However, I find that random assignment of realized treatments generically has no identifying power when some persons are leaders who may affect outcomes throughout the population. Finally, I consider use of models of endogenous social interactions to derive restrictions on response functions. I emphasize that identification of potential outcome distributions differs from the longstanding econometric concern with identification of structural functions.

This paper is a revised version of CWP01/10

]]>We develop a general class of nonparametric tests for treatment effects conditional on covariates. We consider a wide spectrum of null and alternative hypotheses regarding conditional treatment effects, including (i) the null hypothesis of the conditional stochastic dominance between treatment and control groups; ii) the null hypothesis that the conditional average treatment effect is positive for each value of covariates; and (iii) the null hypothesis of no distributional (or average) treatment effect conditional on covariates against a one-sided (or two-sided) alternative hypothesis. The test statistics are based on L1-type functionals of uniformly consistent nonparametric kernel estimators of conditional expectations that characterize the null hypotheses. Using the Poissionization technique of Giné et al. (2003), we show that suitably studentized versions of our test statistics are asymptotically standard normal under the null hypotheses and also show that the proposed nonparametric tests are consistent against general fixed alternatives. Furthermore, it turns out that our tests have non-negligible powers against some local alternatives that are n−½ different from the null hypotheses, where n is the sample size. We provide a more powerful test for the case when the null hypothesis may be binding only on a strict subset of the support and also consider an extension to testing for quantile treatment effects. We illustrate the usefulness of our tests by applying them to data from a randomized, job training program (LaLonde, 1986) and by carrying out Monte Carlo experiments based on this dataset.

]]>In this paper we consider endogenous regressors in the binary choice model under a weak median exclusion restriction, but without further specification of the distribution of the unobserved random components. Our reduced form specification with heteroscedastic residuals covers various heterogeneous structural binary choice models. As a particularly relevant example of a structural model where no semiparametric estimator has of yet been analyzed, we consider the binary random utility model with endogenous regressors and heterogeneous parameters. We employ a control function IV assumption to establish identification of a slope parameter 'â' by the mean ratio of derivatives of two functions of the instruments. We propose an estimator based on direct sample counterparts, and discuss the large sample behavior of this estimator. In particular, we show '√'n consistency and derive the asymptotic distribution. In the same framework, we propose tests for heteroscedasticity, overidentification and endogeneity. We analyze the small sample performance through a simulation study. An application of the model to discrete choice demand data concludes this paper.

]]>This paper gives identification and estimation results for quantile and average effects in nonseparable panel models, when the distribution of period specific disturbances does not vary over time. Bounds are given for interesting effects with discrete regressors that are strictly exogenous or predetermined. We allow for location and scale time effects and show how monotonicity can be used to shrink the bounds. We derive rates at which the bounds tighten as the number T of time series observations grows and give an empirical illustration.

]]>This paper is a revised version of cemmap working paper CWP15/08

]]>Updated version available CWP31/11

]]>We study the identification of panel models with linear individual-specific coefficients, when T is fixed. We show identification of the variance of the effects under conditional uncorrelatedness. Identification requires restricted dependence of errors, reflecting a trade-off between heterogeneity and error dynamics. We show identification of the density of individual effects when errors follow an ARMA process under conditional independence. We discuss GMM estimation of moments of effects and errors, and introduce a simple density estimator of a slope effect in a special case. As an application we estimate the effect that a mother smokes during pregnancy on child's birth weight.

]]>This paper considers semiparametric efficient estimation of conditional moment models with possibly nonsmooth residuals in unknown parametric components (Θ) and unknown functions (h)of endogenous variables. We show that: (1) the penalized sieve minimum distance(PSMD) estimator (ˆΘ, ˆh) can simultaneously achieve root-n asymptotic normality of ˆΘ and nonparametric optimal convergence rate of ˆh, allowing for noncompact function parameter spaces; (2) a simple weighted bootstrap procedure consistently estimates the limiting distribution of the PSMD ˆΘ; (3) the semiparametric efficiency bound formula of Ai and Chen (2003) remains valid for conditional models with nonsmooth residuals, and the optimally weighted PSMD estimator achieves the bound; (4) the centered, profiled optimally weighted PSMD criterion is asymptotically chi-square distributed. We illustrate our theories using a partially linear quantile instrumental variables (IV) regression, a Monte Carlo study, and an empirical estimation of the shape-invariant quantile IV Engel curves.

This is an updated version of CWP09/08.

]]>This paper develops methodology for nonparametric estimation of a polarization measure due to Anderson (2004) and Anderson, Ge, and Leo (2006) based on kernel estimation techniques. We give the asymptotic distribution theory of our estimator, which in some cases is nonstandard due to a boundary value problem. We also propose a method for conducting inference based on estimation of unknown quantities in the limiting distribution and show that our method yields consistent inference in all cases we consider. We investigate the finite sample properties of our methods by simulation methods. We give an application to the study of polarization within China in recent years.

]]>This is a substantial revision of "Semiparametric identification of structural dynamic optimal stopping time models", CWP06/07.

]]>We investigate a method for extracting nonlinear principal components. These principal components maximize variation subject to smoothness and orthogonality constraints; but we allow for a general class of constraints and densities, including densities without compact support and even densities with algebraic tails. We provide primitive sufficient conditions for the existence of these principal components. We also characterize the limiting behavior of the associated eigenvalues, the objects used to quantify the incremental importance of the principal components. By exploiting the theory of continuous-time, reversible Markov processes, we give a different interpretation of the principal components and the smoothness constraints. When the diffusion matrix is used to enforce smoothness, the principal components maximize long-run variation relative to the overall variation subject to orthogonality constraints. Moreover, the principal components behave as scalar autoregressions with heteroskedastic innovations. Finally, we explore implications for a more general class of stationary, multivariate diffusion processes.

]]>This paper extends the method of local instrumental variables developed by Heckman and Vytlacil (1999, 2001, 2005) to the estimation of not only means, but also distributions of potential outcomes. The newly developed method is illustrated by applying it to changes in college enrollment and wage inequality using data from the National Longitudinal Survey of Youth of 1979. Increases in college enrollment cause changes in the distribution of ability among college and high school graduates. This paper estimates a semiparametric selection model of schooling and wages to show that, for fixed skill prices, a 14% increase in college participation (analogous to the increase observed in the 1980s), reduces the college premium by 12% and increases the 90-10 percentile ratio among college graduates by 2%.

]]>Single equation instrumental variable models for discrete outcomes are shown to be set not point identifying for the structural functions that deliver the values of the discrete outcome. Identified sets are derived for a general nonparametric model and sharp set identification is demonstrated. Point identification is typically not achieved by imposing parametric restrictions. The extent of an identified set varies with the strength and support of instruments and typically shrinks as the support of a discrete outcome grows. The paper extends the analysis of structural quantile functions with endogenous arguments to cases in which there are discrete outcomes.

This paper is a revised version of the original issued in December 2008.]]>In this paper we introduce a new flexible mixed model for multinomial discrete choice where the key individual- and alternative-specific parameters of interest are allowed to follow an assumption-free nonparametric density specification while other alternative-specific coefficients are assumed to be drawn from a multivariate normal distribution which eliminates the independence of irrelevant alternatives assumption at the individual level. A hierarchical specification of our model allows us to break down a complex data structure into a set of submodels with the desired features that are naturally assembled in the original system. We estimate the model using a Bayesian Markov Chain Monte Carlo technique with a multivariate Dirichlet Process (DP) prior on the coefficients with nonparametrically estimated density. We employ a "latent class" sampling algorithm which is applicable to a general class of models including non-conjugate DP base priors. The model is applied to supermarket choices of a panel of Houston households whose shopping behavior was observed over a 24-month period in years 2004-2005. We estimate the nonparametric density of two key variables of interest: the price of a basket of goods based on scanner data, and driving distance to the supermarket based on their respective locations. Our semi-parametric approach allows us to identify a complex multi-modal preference distribution which distinguishes between inframarginal consumers and consumers who strongly value either lower prices or shopping convenience.

]]>**Please note:** This paper is a revised version of cemmap working Paper CWP09/07.

This paper studies nonparametric estimation of conditional moment models in which the residual functions could be nonsmooth with respect to the unknown functions of endogenous variables. It is a problem of nonparametric nonlinear instrumental variables (IV) estimation, and a difficult nonlinear ill-posed inverse problem with an unknown operator. We first propose a penalized sieve minimum distance (SMD) estimator of the unknown functions that are identified via the conditional moment models. We then establish its consistency and convergence rate (in strong metric), allowing for possibly non-compact function parameter spaces, possibly non-compact finite or infinite dimensional sieves with flexible lower semicompact or convex penalty, or finite dimensional linear sieves without penalty. Under relatively low-level sufficient conditions, and for both mildly and severely ill-posed problems, we show that the convergence rates for the nonlinear ill-posed inverse problems coincide with the known minimax optimal rates for the nonparametric mean IV regression. We illustrate the theory by two important applications: root-n asymptotic normality of the plug-in penalized SMD estimator of a weighted average derivative of a nonparametric nonlinear IV regression, and the convergence rate of a nonparametric additive quantile IV regression. We also present a simulation study and an empirical estimation of a system of nonparametric quantile IV Engel curves.

]]>This paper develops a broad theme about policy choice under ambiguity through study of a particular decision criterion. The broad theme is that, where feasible, choice between a status quo policy and an innovation is better framed as selection of a treatment allocation than as a binary decision. Study of the static minimax-regret criterion and its adaptive extension substantiate the theme. When the optimal policy is ambiguous, the static minimax-regret allocation always is fractional absent large fixed costs or deontological considerations. In dynamic choice problems, the adaptive minimax-regret criterion treats each cohort as well as possible, given the knowledge available at the time, and maximizes intertemporal learning about treatment response.

]]>We propose a new method of testing stochastic dominance which improves on existing tests based on bootstrap or subsampling. Our test requires estimation of the contact sets between the marginal distributions. Our tests have asymptotic sizes that are exactly equal to the nominal level uniformly over the boundary points of the null hypothesis and are therefore valid over the whole null hypothesis. We also allow the prospects to be indexed by infinite as well as finite dimensional unknown parameters, so that the variables may be residuals from nonparametric and semiparametric models. Our simulation results show that our tests are indeed more powerful than the existing subsampling and recentered bootstrap.

]]>Consider an observed binary regressor D and an unobserved binary variable D*, both of which affect some other variable Y . This paper considers nonparametric identification and estimation of the effect of D on Y , conditioning on D* = 0. For example, suppose Y is a person's wage, the unobserved D* indicates if the person has been to college, and the observed D indicates whether the individual claims to have been to college. This paper then identifies and estimates the difference in average wages between those who falsely claim college experience versus those who tell the truth about not having college.We estimate this average returns to lying to be about 7% to 20%. Nonparametric identification without observing D* is obtained either by observing a variable V that is roughly analogous to an instrument for ordinary measurement error, or by imposing restrictions on model error moments.

]]>In this paper, we document whether and how much the equalizing force of earnings mobility has changed in France in the 1990s. For this purpose, we use a representative three-year panel,the French Labour Force Survey. We develop a model of earnings dynamics that combines a flexible specification of marginal earnings distributions (to fit the large cross-sectional dimension of the data) with a tight parametric representation of the dynamics (adapted to the short timeseries dimension). Log earnings are modelled as the sum of a deterministic component, an individual fixed effect, and a transitory component which is assumed first-order Markov. The transition probability of the transitory component is modelled as a one-parameter Plackett copula. We estimate this model using a sequential EM algorithm.

We exploit the estimated model to study employment/earnings inequality in France over the 1990-2002 period. We show that, in phase with business cycle fluctuations (a recession in 1993 and two peaks in 1990 and 2000), earnings mobility decreases when cross-section inequality and unemployment risk increase. We simulate individual earnings trajectories and compute present values of lifetime earnings over various horizons. Inequality presents a hump-shaped evolution over the period, with a 9% increase between 1990 and 1995 and a decrease afterwards.Accounting for unemployment yields an increase of 11%. Moreover, this increase is persistent, as it translates into a 12% increase in the variance of log present values. The ratio of inequality in present values to inequality in one-year earnings, a natural measure of immobility or of the persistence of inequality, remains remarkably constant over the business cycle.

]]>In this paper,we construct a nonparametric estimator of the distributions of latent factors in linear independent multi-factor models under the assumption that factor loadings are known. Our approach allows to estimate the distributions of up to L(L+1)/2 factors given L measurements. The estimator works through empirical characteristic functions. We show that it is consistent, and derive asymptotic convergence rates. Monte-Carlo simulations show good finite-sample performance, less so if distributions are highly skewed or leptokurtic. We finally apply the generalized deconvolution procedure to decompose individual log earnings from the PSID into permanent and transitory components.

]]>We study linear factor models under the assumptions that factors are mutually independent and independent of errors, and errors can be correlated to some extent. Under factor non-Gaussianity, second to fourth-order moments are shown to yield full identification of the matrix of factor loadings. We develop a simple algorithm to estimate the matrix of factor loadings from these moments. We run Monte Carlo simulations and apply our methodology to British data on cognitive test scores.

]]>We provide nonparametric estimators of derivative ratio-based average marginal effects of an endogenous cause, X, on a response of interest, Y , for a system of recursive structural equations. The system need not exhibit linearity, separability, or monotonicity. Our estimators are local indirect least squares estimators analogous to those of Heckman and Vytlacil (1999, 2001) who treat a latent index model involving a binary X. We treat the traditional case of an observed exogenous instrument (OXI)and the case where one observes error-laden proxies for an unobserved exogenous instrument (PXI). For PXI, we develop and apply new results for estimating densities and expectations conditional on mismeasured variables. For both OXI and PXI, we use infnite order flat-top kernels to obtain uniformly convergent and asymptotically normal nonparametric estimators of instrument-conditioned effects, as well as root-n consistent and asymptotically normal estimators of average effects.

]]>This paper presents a nonparametric analysis of the canonical habits model. The approach is based on the combinatorial/revealed preference framework of Samuelson (1948), Houthakker (1950), Afriat (1967) and Varian (1982) and the extenstion and application of these ideas to intertemporal models in Browning (1989). It provides a simple finitely computable test of the model which does not require a parameterisation of the underlying (hypothesised) preferences.It also yields set identification of important features of the canonical habits model including the consumer's rate of time preference and the welfare effects of habit-formation. The ideas presented are illustrated using Spanish panel data.

]]>This paper considers parametric estimation problems with independent, identically,non-regularly distributed data. It focuses on rate-effciency, in the sense of maximal possible convergence rates of stochastically bounded estimators, as an optimality criterion,largely unexplored in parametric estimation.Under mild conditions, the Hellinger metric,defined on the space of parametric probability measures, is shown to be an essentially universally applicable tool to determine maximal possible convergence rates. These rates are shown to be attainable in general classes of parametric estimation problems.

]]>In this paper, the regression discontinuity design (RDD) is generalized to account for differences in observed covariates X in a fully nonparametric way. It is shown that the treatment effect can be estimated at the rate for one-dimensional nonparametric regression irrespective of the dimension of X. It thus extends the analysis of Hahn, Todd and van der Klaauw (2001) and Porter (2003), who examined identification and estimation without covariates, requiring assumptions that may often be too strong in applications. In many applications, individuals to the left and right of the threshold differ in observed characteristics. Houses may be Cconstructed in different ways across school attendance district boundaries. Firms may differ around a threshold that implies certain legal changes, etc. Accounting for these differences in covariates is important to reduce bias. In addition, accounting for covariates may also reduces variance. Finally, estimation of quantile treatment effects (QTE) is also considered.

]]>We model attitudes as latent variables that induce stochastic dominance relations in (item) responses. Observable characteristics that affect attitudes can be incorporated into the analysis to improve the measurement of the attitudes; the measurements are posterior distributions that condition on the responses and characteristics of each respondent. Methods to use these measurements to characterize the relation between attitudes and behaviour are developed and implemented.

]]>Decision processes among couples depend on the balance of power between the partners, determining the welfare of household members as well as household outcomes. However, little is known about the determinants of power. The collective model of household behavior gives an operational definition of decision power. We argue that important aspects of this concept of power are measurable through self-assessments of partners' say. Using such a measure, we model balance of power as an outcome of the interplay between both partners' demographic,socioeconomic, and health characteristics. Advancing flexible, yet parsimonious empirical models is crucial for the analysis, as both absolute status as well as relative position in the couple might potentially affect the balance of power, and gender-asymmetries may be important. Appropriately, we advance semiparametric double index models that feature one separate index for each spouse, which interact nonparametrically in the determination of power.Based on data from the Mexican Health and Aging Study (MHAS), we find education and employment status to be associated with more individual decision power,especially for women. Moreover, health and income have independent effects on the distribution of power. We also show that contextual factors are important determinants of decision power, with women in urban couples featuring more decision power than their rural counterparts.

]]>

However, Hahn, Hausman, and Kuersteiner (HHK 2004) demonstrated that the "problem" of LIML led to undesirable estimates in this situation. Morimune (1983) analyzed both the bias in 2SLS and the lack of moments in LIML. While it was long known that LIML did not have finite sample moments, it was less known that this lack of moments led to the undesirable property of considerable dispersion in the estimates, e.g. the inter-quartile range was much larger than 2SLS. HHK developed a jackknife 2SLS (J2SLS) estimator that attenuated the 2SLS bias problem and had good dispersion properties. They found in their empirical results that the J2SLS estimator or the Fuller estimator, which modifies LIML to have moments, did well on both the bias and dispersion criteria. Since the Fuller estimator had smaller second order MSE, HHK recommended using the Fuller estimator. However, Bekker and van der Ploeg (2005) and Hausman, Newey and Woutersen (HNW 2005) recognized that both Fuller and LIML are inconsistent with heteroscedasticity as the number of instruments becomes large in the Bekker (1994)sequence. Since econometricians recognize that heteroscedasticity is often present, this finding presents a problem.Hausman, Newey,Woutersen, Chao and Swanson (HNWCS 2007) solve this problem by proposing jackknife LIML (HLIML) and jackknife Fuller (HFull)estimators that are consistent in the presence of heteroscedasticity. HLIML does not have moments so HNWCS (2007)recommend using HFull, which does have moments. However, a problem remains. If serial correlation or clustering exists, neither HLIML nor HFull is consistent.

The continuous updating estimator, CUE, which is the GMM-like generalization of LIML, introduced by Hansen, Heaton, and Yaron (1996) would solve this problem. The CUE estimator also allows treatment of non-linear specifications which the above estimators need not allow for and also allows for general non- spherical disturbances. However, CUE suffers from the moment problem and exhibits wide dispersion. GMM does not suffer from the no moments problem, but like 2SLS, GMM has finite sample bias that grows with the number of moments.

In this paper we modify CUE to solve the no moments/large dispersion problem. We consider the dual formulation of CUE and we modify the CUE first order conditions by adding a term of order 1/T. To first order the variance of the estimator is the same as GMM or CUE, so no large sample efficiency is lost. The resulting estimator has moments up to the degree of overidentification and demonstrates considerably reduced bias relative to GMM and reduced dispersion relative to CUE. Thus, we expect the new estimator will be useful for empirical research. We next consider a similar approach but use a class of functions which permits us to specify an estimator with all integral moments existing. Lastly, we demonstrate how this approach can be extended to the entire family of Maximum Empirical Likelihood (MEL) Estimators, so these estimators will have integral moments of all orders.

]]>It is common practice in econometrics to correct for heteroskedasticity.This paper corrects instrumental variables estimators with many instruments for heteroskedasticity.We give heteroskedasticity robust versions of the limited information maximum likelihood (LIML) and Fuller (1977, FULL) estimators; as well as heteroskedasticity consistent standard errors thereof. The estimators are based on removing the own observation terms in the numerator of the LIML variance ratio. We derive asymptotic properties of the estimators under many and many weak instruments setups. Based on a series of Monte Carlo experiments, we find that the estimators perform as well as LIML or FULL under homoskedasticity, and have much lower bias and dispersion under heteroskedasticity, in nearly all cases considered.

]]>]]>

The most common approach to estimating conditional quantile curves is to fit a curve, typically linear, pointwise for each quantile. Linear functional forms, coupled with pointwise fitting, are used for a number of reasons including parsimony of the resulting approximations and good computational properties. The resulting fits, however, may not respect a logical monotonicity requirement that the quantile curve be increasing as a function of probability. This paper studies the natural monotonization of these empirical curves induced by sampling from the estimated non-monotone model, and then taking the resulting conditional quantile curves that by construction are monotone in the probability.

]]>The system GMM estimator for dynamic panel data models combines moment conditions for the model in first differences with moment conditions for the model in levels. It has been shown to improve on the GMM estimator in the first differenced model in terms of bias and root mean squared error. However, we show in this paper that in the covariance stationary panel data AR(1) model the expected values of the concentration parameters in the differenced and levels equations for the crosssection at time *t* are the same when the variances of the individual heterogeneity and idiosyncratic errors are the same. This indicates a weak instrument problem also for the equation in levels. We show that the 2SLS biases relative to that of the OLS biases are then similar for the equations in differences and levels, as are the size distortions of the Wald tests. These results are shown in a Monte Carlo study to extend to the panel data system GMM estimator.

For a simplified structural equation/IV regression model with one right-side endogenous variable, we obtain the exact conditional distribution function for Moreira's (2003) conditional likelihood ratio (CLR) test. This is then used to obtain the critical value function needed to implement the CLR test, and reasonably comprehensive graphical versions of the function are provided for practical use. The analogous functions are also obtained for the case of testing more than one right-side endogenous coefficient, but only for an approximation to the true likelihood ratio test. We then go on to provide an exact analysis of the power functions of the CLR test, the Anderson-Rubin test, and the LM test suggested by Kleibergen (2002). The CLR test is shown to clearly conditionally dominate the other two tests for virtually all parameter configurations, but none of these test is either inadmissible or uniformly superior to the other two.

]]>This paper is a revised version of CWP01/06.]]>
https://www.cemmap.ac.uk/publications/3748
*n*. The test is bi-directional and applicable to non-nested models which are both possibly misspecified. The asymptotic distribution of the test statistic is derived. The proposed test is general regardless of whether the optimization criteria for estimation of competing models are the same as the SMSEP criterion used for model selection. An empirical application using timber auction data from Oregon is used to illustrate the usefulness and generality of the proposed testing procedure.]]>

It satisfies mild regularity conditions but is otherwise unknown. The paper presents test of the
hypothesis that g is the mean of a random variable Y conditional on a covariate X . The need
to test this hypothesis arises frequently in economics. The test does not require nonparametric
instrumental-variables (IV) estimation of g and is not subject to the ill-posed inverse problem
that nonparametric IV estimation entails. The test is consistent whenever g differs from the
conditional mean function of Y on a set of non-zero probability. Moreover, the power of the test
is arbitrarily close to 1 uniformly over a set of functions g whose distance from the conditional
mean function is O(n^{-1/2}), where is the sample size.]]>
https://www.cemmap.ac.uk/publications/3205

An iterated covariation condition with a weak montonicity restriction is shown to have set identifying power.

]]>a just-identifying, non-falsifiable model permitting nonseparablity of latent vari-ates and devoid of parametric restrictions is obtained. The model requires the endogenous variable to be continuously distributed. It is shown that relaxing this restriction results in loss of point identification but set identification is possible if an additional covariation restriction is introduced. Relaxing other restrictions presents significant challenges.

The A.W.H. Phillips Lecture, presented at the Australasian Meetings of the Econometric Society, July 7th 2004.

]]>The three special estimators have the same asymptotic variance. They achieve the semiparametric efficiency bound in the homoskedastic Gaussian case. We show that the Jackknife method can be used to consistently estimate the asymptotic variance. Our model and estimators are defined with a view to avoid the curse of dimensionality, that severely limits the applicability of existing methods. The empirical likelihood method is developed. It is shown

that when missing responses are imputed using the semiparametric regression method the empirical log-likelihood is asymptotically a scaled chi-square variable. An adjusted empirical log-likelihood ratio, which is asymptotically

standard chi-square, is obtained. Also, a bootstrap empirical log-likelihood ratio is derived and its distribution is used to approximate that of the imputed empirical log-likelihood ratio. A simulation study is conducted to compare the adjusted and bootstrap empirical likelihood with the normal approximation

based method in terms of coverage accuracies and average lengths of confidence intervals. Based on biases and standard errors, a comparison is also

made by simulation between the proposed estimators and the related estimators.

]]>The programme can be downloaded here [zip file, 435KB].]]>
https://www.cemmap.ac.uk/publications/2656
^{-2/5}. This is true regardless of the (finite) dimension of the explanatory variable. Thus, in contrast to the existing asymptotically normal estimator, the new estimator has no curse of dimensionality. Moreover, the asymptotic distribution of each additive component is the same as it would be if the other components were known with certainty.

It develops a control function approach to accounting for endogeneity in triangular
and fully simulataneous binary response models. An application is given to the case of estimating the income effect in a labor market participation
problem using a large micro data set from the British FES. The semiparametric estimator is found to perform well detecting a significant attenuation bias. The proposed estimator is contrasted to the corresponding Probit and Linear Probability specifications.]]>
https://www.cemmap.ac.uk/publications/2677