## Efficient Semiparametric Estimation via Moment Restrictions

by Whitney K. Newey
Citation
Title:
Efficient Semiparametric Estimation via Moment Restrictions
Author:
Whitney K. Newey
Year:
2004
Publication:
Econometrica
Volume:
72
Issue:
6
Start Page:
1877
End Page:
1897
Publisher:
Language:
English
URL:
DOI:
PMID:
ISSN:
Abstract:

EFFICIENT SEMIPARAMETRIC ESTIMATION VIA
MOMENT RESTRICTIONS

Conditional moment restrictions can be combined through GMM estimation to con- struct more efficient semiparametric estimators. This paper is about attainable effi- ciency for such estimators. We define and use a moment tangent set, the directions of departure from the truth allowed by the moments, to characterize when the semi- parametric efficiency bound can be attained. The efficiency condition is that the mo- ment tangent set equals the model tangent set. We apply these results to transformed, censored, and truncated regression models, e.g., finding that the conditional moment restrictions from Powell's (1986) censored regression quantile estimators can be com- bined to approximate efficiency when the disturbance is independent of regressors.

KEYWORDS:

GMM, semiparametric efficiency, tangent set, moments

1. INTRODUCTION

GENERALIZEDMETHOD OF MOMENTS (GMM, Hansen (1982)) provides a useful way of constructing efficient estimators, by combining moment restrictions. This approach is parsimonious and has good small sample properties in many cases (see Chamberlain (1987) and Newey (1988, 1993)). It is particularly useful in models where the efficiency bound is complicated, so that direct construction of an efficient estimator is difficult, but there are relatively simple moment conditions that can be used for estimation. There are many important examples of such models, including several considered in this paper.

The purpose of this paper is to consider efficiency with an infinite sequence of condi- tional moment restrictions depending on nuisance parameters. We show that the limit of the GMM asymptotic variance equals the semiparametric bound when the moment conditions characterize the semiparametric model in a certain local sense, discussed below. This result enables one to check to see whether a particular sequence of mo- ment conditions has "complete information" about parameters of interest, in the sense that they lead to full efficiency. For example, we find that, in the censored regression model with disturbance independent of regressors, the conditional moment restrictions from Powell's (1986) quantile estimators can be combined to achieve efficiency, despite their regressor trimming. We also show that in truncated regression models, moment restrictions like those of Newey (1987) can be combined for efficiency. In both of these examples it is relatively simple to check efficiency, despite the complicated nature of the bounds.

Our efficiency results are based on the set of directions of departure from the truth that are allowed by the moment conditions, which we refer to as the moment tangent set. Our spanning condition for efficiency is that this moment tangent set is equal to the set of directions allowed by the model, which is the model tangent set familiar from the

'This work was partially completed as a Fellow at the Center for Advanced Study in the Behav- ioral Sciences. Also, the NSF provided financial support. This paper was presented at the 2000 World Congress of the Econometric Society. Helpful comments were provided by R. Blundell,

L. Hansen, a co-editor, two referees, and seminar participants at USC and Wisconsin.

literature on semiparametric efficiency (e.g., see Bickel et al. (1993)). This condition is the dual of that used by Chamberlain (1987), Newey (1988, i993), and Hahn (1997), which is based on approximation by linear combinations of the moment conditions. The dual approach via the moment tangent set is useful because often the tangent sets are much simpler than the set of linear combinations of moment functions.

It is now well established that, when a spanning condition is satisfied, it is generally possible to find rates of growth for the moment conditions as a function of sample size to achieve asymptotic efficiency. Examples include Newey (1988, 1993), Hahn (1997), Koenker and Machado (1999), and Donald, Inibens, and Newey (2003). Unfortunately, the specifics often depend crucially on the model and particular form of the moment conditions. In contrast. the efficiency condition wc consider is quite general and can be applied in a straightforward way to many models, as illustrated by the examples we give. For this reason we focus on the efficiency condition in this paper, and do not give rate results.

Section 2 describes the models and estimators. Section 3 derives the efficiency con- dition. Section 4 applies it to censored and truncated regression models.

2. COMBINING MOMENT RESTRICTIONS

To describe the general type of estimator we consider, let z denote a single data observation, p a q x 1 parameter vector, and r= (yl, yz, . . .) a sequence of scalar parameters, and (pl(z,p, y), pz(z, 0,y),. . .) a sequence of functions, each of which depends only on a finite number of elements of T.Also, let x denote a vector of con- ditioning variables and pi,and yo denote true values. The estimators we consider are based on the conditional moment restrictions

(2.1) Elp,(z, Po, YIY~II~I=O (j=1,2, ...).

The case of unconditional moment restrictions is included as a special case where x =1.

A finite number of these moment conditions can be used to form a GMM estima- tor. Let J denote a positive integer. y, the r x 1 subvector of l-that enters the first J functions, 0 =(P',y;)', and p(z, 8) =(pl(z,p, 7).. . .,pJ(z,0,y))',where indexing by J of r, 0,and p is suppressed for notational convenience. Also, let A(x)be a matrix of functions of the conditioning variables with J columns. Then equation (2.1) implies the unconditional moment restrictions E[A(x)p(z,Oil)] =0. Let (z,,...,2,) denote the data, g,(B)=A(x,)p(z,, H), and g,,(H)= C:=,g,(@)/n .The unconditional restric- tions can be combined to form an estimator 6 in the now familiar way given by Hansen (1982), as

where W is a positive semidefinite matrix. An alternative estimator that would have smaller bias for large numbers of moment restrictions is the empirical likelihood esti- mator, obtained as

see Qin and Lawless (1994), Imbens (1997), and Newey and Smith (2004).

1879

In this paper we will focus on the case where A(x) is efficient, i.e., minimizes the asymptotic variance among all possible A(x).We maintain this focus because the effi- cient A(x)can generally be estimated without affecting efficiency. To describe the effi- cient instruments, let R(x)=E[p(z, Oo)p(z, 80)'lxl, D(x) =dE[p(z,P,~~)Ixl/dPl~=~,,, H(x)= dE[p(z,Po,y)Ixl/dy~l~=~~,

and G(x)= [D(x), H(x)l. Then, as shown by Chamberlain (1987) (and Newey (2001) in the singular R(x)case), the choice of A(x) that minimizes the asymptotic variance of 0is

where for a matrix B, B denotes any generalized inverse, satisfying BB-B =B.In this paper we will give conditions for a fixed subvector of e to be asymptotically efficient

in a semiparametric model as J grows. These efficiency results will apply to a fixed subvector of ras well as to the parameters p that are common to the moment condi- tions. Specifically, we will consider the asymptotic efficiency of 8= [I,,ole, where the dimension p of e (and of I,) remains fixed as J grows.

In general the optimal function A*(x)will need to be estimated. It is well known from Hansen (1982) that this estimation does not affect the asymptotic variance of 0in the unconditional case, where x = 1 and A* is a matrix of constants. It has also been shown in Newey (1993) that this result also holds when x is nontrivial and A*(x) is estimated nonparametrically. This justifies us in ignoring the estimation of A*(x) in the comparison of asymptotic variances. Furthermore, these comparisons will also be valid for the case where p(z, 8) has components that need to be estimated, even nonparametric ones, as long as this estimation does not affect the asymptotic variance.

An example is useful for illustration. It is a semiparametric transformation model with a parametric disturbance distribution. In this model z = (y, x) for a scalar depen- dent variable y and there is an unknown, monotonic increasing function T(.) satisfying

E and x are independent, E has p.d.f. g(~,

r(y)=x'60+ E, Ao).

This model includes the proportional hazards model as a special case, where E has an extreme value distribution. It also includes proportional hazards with a known distribu- tion for the heterogeneity. In these cases ~(y)

will be equal to the log of the integrated baseline hazard at y. Estimation of this model has been considered previously by Bickel et al. (1993), where further references are given. In general the efficiency bound for this model is complicated, as are the efficient estimators that have previously been pro- posed (except for certain special cases), while there are simple moment conditions that can be used for approximately efficient estimation.

Parametric conditional moment restrictions can be obtained by considering the prob- ability that y lies in intervals, as in Han and Hausman (1990). Consider a sequence (j,);, of scalars. Let p = (8,A')',G(u,A) = l_"xg(~,

A) de be the CDF corresponding to g(s, A), and

These residuals will satisfy the conditional moment restrictions of equation (2.1) for yio= r(j,).Here the parameters y, represent values of the transformation at various pbints. Therefore, 6 may include estimators of the transformation at certain points, corresponding to estimators of the integrated hazard in duration models. It turns out that these moment restrictions can be used to approximately attain the semiparametric bound for the transformation model, including for the estimators of the transformation values.

The optimal GMM estimator based on these conditions will be equivalent to the maximum likelihood estimator (MLE) for the ordered choice model based on the in- tervals between the cutoffs y,. Specifically, for 8=(P', y,, . . . ,y,)', P,(x, 0) = G(y,x'6,A) -G(y,-I -x'6, A) (j = 1, ... ,J + I), with yo = -cc and y,-, = +m, the or- dered choice MLE will satisfy

The first-order conditions for this MLE can be written as (see Appendix B),

This has the form of a GMM estimator where A(x) has jth column dln[P,(x, \$)/ P,+,(x, 6)]/d8. By efficiency of MLE we know that this A(x)must be efficient (where estimation of A(x) does not affect the efficiency), making the MLE asymptotically equivalent to the GMM estimator with A'(x). Consequently, the efficiency results for GMM given below will apply to the ordered choice MLE. It will be shown if (j,):, is dense in 3,then as J -+ mthe asymptoticvariance of a fixed subvector of approaches the semiparametric bound.

A second example is the conditional mean index model of Ichimura (1993), where

for some vector of regressors w and known vector of functions v(w,p). A simple ap- proach to efficient estimation can be based on unconditional moment restrictions. This model implies that for any function a(w)with finite second moment, v =v(w,Po),and E =y -E[ylvl,

These moment restrictions do not have the simple parametric form of equation (2.1), due to the presence of conditional expectations. However, it is possible to use non- parametric estimators for the conditional expectations without affecting the asymptotic variance, so that asymptotic variance comparisons can be made as if the conditional ex- pectations were known.

For a sequence of functions (al(w), az(w), . . .), let

For these functions the moment conditions of equation (2.1), with x = 1,are equivalent to equation (2.4). Let

where E[.lv(x,p)] denotes some nonparametric regression estimator with regres- sors v(x,p), and let i(z,0)= (il(z,P),...,jJ(z,0))'.Then it is well known that C:=,i(zl, Po)/,/% and C:=,p(zi, Po)/,/% have the same limiting distribution and that for vp=dv(w,Po)/dpand a(w)= (al(w),...,a,(w))',

Consequently, a GMM estimator based on F(z,P) will have the same asymptotic variance as one based on p(z, P). Also, since x = 1 here, each A(x) just corresponds to constant linear combination coefficients, with the optimal one given by A*(x)= A* = G'W' for R = E[p(z, Po)p(z, Po)']. Then an optimal GMM estima- tor can be constructed in the usual way, by using a preliminary estimator p to form

fi=C:'=li(z, P);(Z, PIr/n, forming j,(P) =C:=,;(z,, P)/n, and solving

It will be shown that this estimator is approximately efficient as J grows, if (a,(w), aZ(w),. . .) is a mean square spanning set, meaning that finite linear combinations of these functions can approximate as closely as desired any function with finite mean square.

3. THE SPANNING CONDITION

A certain spanning condition is critical for the GMM estimator = [I, 0]i to ap- proximately attain the semiparametric efficiency bound. The asymptotic variance of 6 will be

##### ZJ= [I,~l{~[G(x)'fl(x)-G(x)ll-'[I,

01'

As is usual for GMM with an increasing set of moment conditions, ZJwill be decreasing in J, in the positive semidefinite sense. Consequently, 2, = limJ,, .ZJ will exist (see Appendix B). The GMM estimator will be approximately efficient if 2,is equal to the semiparametric (asymptotic) variance bound. The spanning condition will be sufficient for this equality.

Intuitively, efficiency should be closely related to whether the moment conditions characterize the semiparametric model, i.e., whether the restrictions imposed by all the moment conditions are the same as imposed by the model. Unless this condition holds, there will be information in the model that is not exploited by the GMM estimator. When this condition holds the GMM estimator based on many moments should pick up most of the information in the model, leaving only a small remainder when enough moments are used.

Because asymptotic efficiency is a local property, a local formulation of the efficiency condition, in terms of directions of departure from the truth, gives the easiest approach. The spanning condition will be that the set of directions allowed by the moment con- ditions is the same as allowed by the model. These direction sets are referred to as tangent spaces, so that the spanning condition is that the model tangent space is the same as the moment tangent space. This is a local version of the condition that the moment restrictions imply the semiparametric model.

Before stating the spanning condition we describe the tangent sets. The model tan- gent set is formulated in terms of scores, as in Bickel et al. (1993). Partition the data observation as z = (y, x), and suppose that the model specifies that the conditional density of y given x is a member of a semiparametric family

where t3 is an open subset of !)ilJ, h denotes a function, and X is a set of such func- tions. For example, in the transformation model given above the density of y given x has this form with h = r and f (ylx, p,h)= [dr(y)/dy]g(r(y)-x'6, A) for the den- sity g(~, ~G(E,A)/~E.

A) = It will be assumed throughout that the marginal distri- bution of x is unrestricted, as appropriate for evaluating efficiency with conditional moment restrictions. Define a regular parametric submodel to be the family of den- sitie5 (f (ylx, Po,h(7))),where 7 is a scalar parameter, with h(7)equal to the truth at some 770, where "regular" means that the square root of the density is mean-square differentiable with respect to 7,has a nonzero Fisher information, and possibly satis- fies other regularity conditions (such as boundedness of conditional second moments of p(z,0)). Let S, =d In f (y lx, Po,h(v))/d~I,=,, denote the score for the parametric submodel, where a z argument is suppressed for notational convenience and the scores are defined more precisely in terms of derivatives of the square root of the density (e.g., see Bickel et al. (1993)). The model tangent set T is the closed linear span of the set of such scores. It represents directions of departure from the truth that are allowed by the model.

To describe the moment tangent set, consider a parametric family (f(ylx,v))of conditional densities satisfying the moment restrictions of equation (2.1), meaning that there exists y(7)such that

/p,(z, PU,r(a))f(yx, 7) dy =0 (j = 1,2,. . . ),

identically in 7.Differentiating this identity with respect to 7,for j = 1,. . .,J, gives

where H(x)=dE[p(z,Po,y)lx]/dyJl,=,,,.This suggests a tangent set for the first J moments of the form

(3.3) Tj = {t:E[t2]< x,E[tlx]=0, E[ptlx]=H(x)cfor a constant vector c},

where E[tlx]=0 holds because of the usual zero mean property of conditional scores. Then, because TJwill be a decreasing sequence of sets (increasing J corresponds to adding moment conditions), the tangent set for all the moments will be given by

Here T, represents the set of all directions of departure from the truth that are allowed by the moment conditions.

Assuming the moment conditions are implied by the semiparametric model, it will be the case that a score for a parametric submodel satisfies equation (3.2) for all J. Consequently, T T,. Therefore, the model and moment tangent spaces will be equal if T, T, meaning that any direction of departure allowed by the moment conditions is also allowed by the model. This leads to the following condition:

Intuitively, using GMM will lead to approximate efficiency when imposing all the moment conditions restricts the density so as to only allow directions of departure that are given by the semiparametric model.

We use two regularity conditions to obtain a precise result. We define regularity of a parametric family of densities as in the discussion of the model tangent space above.

ASSUMPTION1: With probability one f (yx,p, ho) is regular in p, dE[p,(z,p, yo)Ix]/ dPlp=p, exists, and p,(z, P, yo) is continuous at each p. Also, Jrna~~,~p,(z,

p, yo)2x f (ylx,p, ho) dy is bounded.

For some of the examples it will be important that this condition allows the residual to be discontinuous in p, as long as at each p this occurs with probability zero. The next condition allows for some of the residuals to be zero with positive probability, which is also important in the examples.

ASSUMPTION2: For each J there is R(x) such that H(x) = R(x)R(x), there is a sym- metricgeneralized inverse R(x)- such that E[G(x)'R(x)-G(x)]

exists and is nonsingular, and p has a finite and nonsingular semiparametric variance bound.

The condition H(x) = R(x)R(x)is easy to check in the examples we consider and should be satisfied quite generally. If there is a parametric submodel f (ylx, q), as discussed above, with dy(qo)/dv nonsingular, then by equation (3.2), H(x) = E[ptl{-dy(qO)/dq)-'

1x1, SO this condition holds by Lemma A.3 of the Appendix.

THEOREM1: If Assumptions 1 and 2 and the spanning condition are satisfied, then lim,,, ZJ= C, is the semiparametric bound.

In the Appendix a projection formula for the GMM limit Z, is derived. This formula extends Chamberlain's (1987) bound to the case where there are a countably infinite number of moment restrictions. It is compared with a corresponding formula for the semiparametric bound to obtain the proof of Theorem 1. For ease of exposition we reserve discussion of these formulae and the proofs to the Appendix.

Consider the transformation model as an example. For a parametric submodel r(y, q), the score is

where subscripts denote partial derivatives, g(~)= g(&,A"), s(s) = g,(&)/g(&),and v = xfSo.Therefore, the tangent set Twill be the closed, linear span of the set of objects of this form. To compare this set with T,,, note that p,(z, p, y) depends only on y,, so

that H,k(x) =0 (j# k),and H,,(x) = dE[p,(z, PO,~,o)lxIld~,= -g(r(j,) -v). Then T, will consist of those t(y, x) such that E[tix] =0 and

If (j,)i"_,is dense and g(~) is differentiable and positive everywhere, then there is a c(y) such that this equation holds with j, replaced by any y E 9i. Differentiating with respect to y and solving for t then gives

This expression has exactly the same form as the score for q for a parametric submodel, with -c(y) replacing ~,(y). Thus, the moment tangents satisfy the same conditions as the score for parametric submodels, and hence the spanning condition will be satisfied. Consequently, the asymptotic variance of the ordered choice MLE of the regression and distribution parameters, as well transformation values, will converge to the semi- parametric bound as the intervals become finer.

Consider next the index model example. In this case a parametric submodel f (z q) must be such that 1yf (ylw, q) dy is a function of only v. Therefore, its derivative will also be a function of only v, giving

where the second equality follows by the usual mean zero property of scores E[dIn f(ylw, qo)/dqw] =OandthethirdbyS, =dIn f(ylw, qO)/dq+aIn f(wlqO)/dq and E[rlw] = 0. Since the score is otherwise unrestricted, it follows that the model tangent set is {t:E[&tlw] = E[&tlu]]. NOW suppose that finite linear combinations of (a,(w), az(w), ...) can approximate any function with finite mean-square arbitrarily well. For instance, the set of all integer power series in a bounded, one-to-one transfor- mation of w have this property. It is well known that this property is equivalent to any function 6(w) with E[6(w)'] finite and E[a,(w)G(w)] =0 for all j being zero. Then, since x = 1 in this example, the moment tangent set is given by the set of t with finite mean square, such that for each j,

Suppose that Var(~1w) is bounded, so that E[stw] has finite mean square. Then the mean-square spanning property and this equation imply that

Thus, the moment tangents satisfy the same conditions as the scores for parametric submodels, and hence the spanning condition will be satisfied.

The previous GMM efficiency result of Chamberlain (1987) is based on approximat- ing by a linear combination of moment conditions. The spanning condition corresponds to the dual of this previous approach. To compare with the previous approach it needs to be generalized to allow for the nuisance parameters y and for conditioning on x, and the efficiency of estimators of P should be considered. To do so, let T/J be the block of ZJcorresponding to p and let V be the semiparametric variance bound for estima- tors of p. Also, for a set A consisting of random variables with finite mean square and conditional mean zero given x,let

denote its orthogonal complement. As is well known, under appropriate regularity con- ditions there is a representation V = (EISS1])-',with each component of S being the element of T' that is closest in mean-square to the corresponding component of the score for p. The random vector S is often referred to as the efficient score. As shown in Newey (1993) for the unconditional case without nuisance parameters, the optimal function A*(x)can be interpreted as the coefficients of a regression of the efficient score on the moment functions, so that the efficiency bound is approximately attained when linear combinations of the moments approximate the efficient score. The follow- ing result generalizes this previous one to conditional moment restrictions with nui- sance parameters.

THEOREM2: IfAssumptions 1and 2 are satisfied, then

and I/, + V if and only iffor each J there is rJ(x)with E[TJ(x)H(x)]= 0 such that E[IIS-TJ(X)~I~~]

+0 as J +ca.

This result shows that the difference of the inverses of the semiparametric bound and GMM variance is the variance of the residuals from approximating the efficient score by ~(x)p, = 0. The presence of x in ~(x)

where E[r(x)H(x)] accounts for the conditioning on x and the constraint on ~(x)of E[r(x)H(x)]=0 accounts for the presence of y. This result specializes to that of Newey (1993) when x = 1 and y is not present. In general T/J + V when for each J there is .irJ(x)such that .rr~(x)pcan approximate the efficient score arbitrarily well for large enough J.

One of the positive aspects of this result is that it is constructive, with efficiency following from finding rJ(x)where .irJ(x)papproximates S (and E[.rr, (x) H(x)]= 0). The problem is that constructing such V(X)can be very hard, particularly when y is present. The root of this problem is that the structure of T1is often complicated, lead- ing to a complicated form for the efficient score S.This problem leads to falling back on a more abstract sufficient condition, that any element of TI can be approximated by the moment conditions. Specifically, let M denote the mean-square closure of the set

That is, M is the set of random variables that can be approximated arbitrarily closely in mean-square by .rr~(x)p,with E[T/ (x)H(x)] = 0. Then T' =M will be sufficient for V/ +V, since the components of Sare in Ti. The following result shows that the spanning condition is equivalent to this sufficient condition.

THEOREM3: IfAssumptions 1 and 2 are satisfied, then T,, = MI and the spanning condition is satisfied if and only if T' =M.

Thus we see that the spanning condition is equivalent to T-=M, and so to the pre- vious approach of Chamberlain (1987). Furthermore. this result also shows T, =MI while T = T-' is a well-known result. Thus T = T, is the dual of M = T-. i.e., the spanning condition is the dual of the Chamberlain (1987) type of condition for effi- ciency. The spanning condition stated above has received first priority here because the most difficult cases seem to correspond to TLhaving a complicated structure, but T being relatively simple. This relative simplicity is illustrated by the transformation model example, where it is straightforward to show equality of moment and model tan- gent sets, but Bickel et al. (1993) show that the orthogonal complement of the model tangent set is complicated. Other examples are provided by censored and truncated regression with an independent disturbance.

4.CENSORED AND TRUNCATED REGRESSION WITH INDEPENDENT DISTURBANCE

Two important semiparametric limited dependent variable models are censored and truncated regression models with a disturbance that is independent of the regressors. There is a large literature on estimation of these models; see Powell (1994). In both of these models the efficiency bounds are complicated, but there are simple moment conditions, so that GMM may be useful for efficient estimation. In this section we show how simple moment conditions can be combined to approximately attain the semipara- metric bound for each of these models.

These models can be formulated as missing data models for the latent regression

(4.1) y* =x'Pn + E, E and x are independent, E has p.d.f. g(~).

The censored regression model is one where x is always observed, but only y = max(0,y-) is observed. The truncated regression model is one where (y, x) is only ob- served if y* > 0.

To construct moment conditions in each model we consider functions m,(~)(1= 1, . .. ,n) and suppose that there is y,, such that E*[m,(e-y,,,)]= 0, where E*[.] represents the expectation for the latent data. For censored regression we require that mi(&)be constant below some value, and let T, = sup{F:m,(~)= m,(F),E 5 E). For truncated regression we require that m,(~)be zero below some value, and let T, = sup(6:m,(~)=0, F 5 6). Then for 0 = (P',y,, . .. ,y~)'and

the conditional moment restriction of equation (2.1) is satisfied, as shown by Newey (2001),where references and examples are given. The optimal matrix A*(x)has the same form for both censored and truncated re- gression. Let A be the J x J matrix with .Alk = --ykO)l(j,k =

E*[~,(Ey,,)mk(~ 1,.. . ,J). Also, let d be the J x 1 vector with d, = rlE*[m,(~-y,, + cu)l/ilcula=o,and

let D = diag(dl, . .. ,dJ) be the diagonal matrix with jth diagonal element dl. Also, let I(x, 8) be the selection matrix that selects those p,(z, 0) with y, + x'p > -rl, and I(x) =I(x, Oo). Then, as shown in Newey (2001), G(x) = I(x)'I(x)[dxl,Dl and O(x)-=I(~)'[I(X)AI(X)']-~I(~),

so that

#### (4.3) A*(x)= [dx', D]'I(x)'[I(x)AI(x)']~~I(x).

4.1. Censored Regression For censored regression we consider quantile estimation, where mi(&) = 1(~

< 0) a,, 0 < a, < 1, as in Powell (1986). Here r, = 0 and yj0 is the ajth quantile of the dis- tribution of E. Also, it is straightforward to estimate the unknown components I(x), d, and A of A'(x). Here Ajk =min{a,, ak] -aIak is known and dl = g(yJo). Let p and 7, be preliminary estimators of the parameters and C, = x:p. For example, p could be obtained from some censored regression quantile estimator and each 7, from minimiz- ing the censored regression quantile objective function I:=, max(0,C. + y,]),

q,(yl where qj(u) = [a,-1(u > 0)Iu. For Ei = yi -C,, let K(u) denote a kernel function, satisfying 1K(u) du = 1 and other regularity conditions, h, a bandwidth parame-

ter. K,. = K((Fi -yj)/h,)l(yl > O), and \$,= /-:%+i K(u) du. The kernel den-

sity estimator of dl from Hall and Horowitz (1990) is ij= ELlKjl/(hjC:=,K,~). Let

A A

d = (dl,. . . , iJ)and D = diag(i,,. . . , iJ). Then A*(x) can be estimated by

This estimator of the optimal instruments can be used to form an estimator as in Newey (2001).

By comparing the model and moment tangent sets we can see why the asymptotic variance will approach the bound as the quantiles become dense on the real line. By independence of E and x, a parametric submodel for the conditional density of y given x will have the form

f (~1". 'I) = l(y > O)g(s,7) + 1(y =0) lIg(u, q) du,

where g(~, q) is a parametric submodel for the density of E and v = x'Po Then for S(E)= dlng(s, q)/~?q1~=,~,

the score will be

where the second equality follows by E[s(~)lx] = 0. Thus, the model tangent set con- sists of functions that depend only on E for y > 0, and that are determined by their values for y > 0. Thus, to show that the spanning condition holds, it suffices to show that the moment tangents depend only on E when y is positive. As long as (a,),"=,are dense in (0, I), we should find that y,, are dense in the support of 8. Then all of the censored regression quantile restrictions together imply that conditional on v > -y,,,, quantile independence of s from x holds at all quantiles with y,," zyJo, since v > -y, includes the set where v > -y,". Thus, by denseness, the restrictions should imply inde- pendence of E and x (and hence t(r, x) depending only on E) on the set where E > yJo

and v > -yIo. The spanning condition then holds because the set where y = +v > 0 is the union of the sets where > yl0 and v > -yj" over the dense set (ylo)~,. The following result gives precise conditions for efficiency.

THEOREM4: If g(s) is positive, v is continuously distributed, and (a,),"=, is dense in (0, I),then the asymptotic variance of the GMM estimatorfor censored regression quan- riles converges to the semiparametric bound as 3 -+oc.

Because the spanning condition is satisfied, as J grows the asymptotic variance of the slope estimator will approach the bound derived by Cosslett (1987) and Ritov (1990), and the quantile estimators will also approach efficiency. Thus, combining moment re- strictions from censored regression quantiles leads to efficiency of regression slope and quantile estimators. This approach provides a simple alternative to the efficient esti- mators of Ritov (1990)and Cosslett (2004).It should also be noted that using quantiles amounts to a step function approximation of the efficient estimator, which approxima- tion might be improved by using p, (z, p,y) that are smooth in E.

##### 4.2. Euncated Regression

For truncated regression we consider m,(~)=all(&> 0) -1(~> T,), 0 < a] < 1, r1 > 0, similarly to Newey (1987).Here, for Pr* denoting the latent probability distrib- ution of E, yIois the solution to Pr*(c> y +r,)/ Pr*(&> y) =a],which will exist when the density of g(~)

is strictly log-concave and a boundary condition holds, as specified below. Estimating A*(x)is more difficult for truncated regression because it is not pos- sible to form direct estimators of the constants d and A. One can use a GMM estimator as in Newey (2001).Order j so that y,,,,, < y,,,,, and assume that there are no ties. Let p!(z, 0) denote the vector of the first j elements of p(z, 0), X =(1, x')', and

Evidently, g(z, On) = A(x)p(z, 0") for some matrix A(x) and p(z, 0) from equa- tion (4.2).Also, as shown in Newey (2001), Bg(z, 0") =A*(x)p(z,Oo) for a matrix B, so that the optimal GMM estimator using the moment functions from equation (4.4)is as efficient as the estimator with the best instruments. We refer the interested reader to Newey (2001)for a fuller description of this estimator, including a one step form for the estimator.

To check equality of the moment and model tangent sets, as needed for the span- ning condition, we first derive the model tangent set. By independence of & and x, a parametric submodel for the conditional density of y given x will have the form f (ylx,7)=g(~,77)/ g(u, 77) du, where g(&,77) is a parametric submodel for the density of E.Then for s(&)=rlIng(~,77)/d~,=,,,,the score will be

Here the score is an additively separable function of E and x that has conditional mean zero given x. Thus, to show that the spanning condition holds, it suffices to show that the moment tangents must be additively separable functions of E and x. Intuitively,

1889

conditional on v > -yIo,the moment restrictions hold at all j' with yjso2 3/10, implying that Pr(s > y + T(x)/Pr(c > y(x)does not depend on x for all y > yjo and 7 > 0, i.e., Pr(s > y(x)=c(y)Pr(s > yjolx).Differentiating with respect to y implies that f (F,x) is a product of functions of s and x, so that moment tangents are additive in s and x for v > -y,~and F > yjo. The spanning condition then holds because the set where y > 0 is the union of these sets, similarly to the censored case. The following result makes this intuition precise.

THEOREM5: If g(r) is positive, differentiable, and strictly log concave, lim,,,[l G(y + r)]/[l-G(y)]= 0 for every T > 0, v is continuously distributed, and (a,,r,),"=, is dense in (0,l) x (0, m),then the asymptotic variance of the truncated moment GMM estimator converges to the semiparametric bound as J -+ oc.

Similarly to quantiles, it should be noted that these moment functions correspond to a step function approximation of the efficient estimator, which might be improved by using p, (z,p, y) that are smooth in F.

Dept. ofEconomics, E52-2620, MIT Cambridge, MA 02139, U.S.A.;wnewey@mit.edu .

APPENDIX A: PROOFS

Throughout the Appendix C will denote a generic constant that may be different in different uses and I will denote the same identity matrix that gives 6 = [I,016. To prove Theorem 1, we de- rive a projection formula for the moment limit 2, and compare it with a well-known formula for the semiparametricvariance bound 2.Let proj(Y 1 A) denote the vector of orthogonal projections of the elements of a random vector Y on a closed linear set A, in the Hilbert space of random variables with inner product (Yl/Yz) = E[Y, . Yz]. Also, let Sp = dln f(ylx, P,ho)/dPlp=p,. The semiparametric variance bound for estimators of P is V =Var(Sp -proj(SDIT))-l. Consider any fixed J = big enough that (y,,. . . , yj) includes the nuisance parameters that are present in 6 = [I,,OI~.Let 6 and ~(x)

be the corresponding residual vector and derivative expectation. Consider any A(x) such that E[A(X)G(X)] is nonsingular and ~ar(i(x)p) exists, and let

Then \$(z) is the influence function of a GMM estimator with J =J and A(x) = A(x), meaning that fi(6 -Bo) = \$(z,)/fi+ o,(l). Also let the full tangent space of the semiparametric model be !P = (a'sp + t : a E !R4, t E TI.Then, as shown in Bickel et al. (1993, Proposition 3.3.1), the semiparametric bound for the asymptotic variance of 6 is

Let Pp= (a'sp + t : a E 919, t E To]be the corresponding space for the moment functions. Then it turns out that

as will be shown below. The proof of Theorem 1 will follow from this result. To show equation (A.3),we need several intermediate results and some additional notation. For a random vector Y let [Y] = {a'Y : a E 9V] and for two sets of random variables M and N

let M @ N = {rn+ n: rn EM, n EN).Also for any generalized inverse R(x)let mp = D(x)'fl(x)-p, n~,= H(x)'R(x)-p. T,, = (t:E[t2)< ss, E[tplx] =0), Tjir= [m,].

PROOF:This follows from Assumption I and Lemma 5.4 of Newey and McFadden (1994).

Q.E.D. LEMMA A.2: E[rn;rnp]< ccand E[mkrn,] ix. PROOF: By Lemma A. 1 and Assumption 2. this follows from Lemma B.4. Q.E. D. LEMMA A.3: TJ,is closed, T, = TJ,@ TJH, and Tj, and Tjti are ort/zogonal PROOF:Consider tk -t t with tk E TI,,.Then

~[I~[tplx]]= E[ IE [ ( ~L -~)p-~li]5 Ellth -

tllpll 5 {E[(t t~)'1)"{E[p'll1 '-0. implying that E[tpx]= 0, and hence t E Ti,,. Next, consider t E TI.Then E[prs]= H(x)c.Let tH= c'm, and r, = t -tH.Then tHE TjHby construction, while t, E TI, by E[ptpjx]= E[ptx]-E[pm;lx]c

It follows that t = t, + tFJE T,, @ TJf,.Next suppose t E TJ,\$ TIH,so that t = t, + tH, t, E TJp, tHE TjH.Then we have for tH= rn;cr,

so t E TJ. Finally, for any t, E TI,. t,, E TjH. E[tptH]= E[E[~~,~~x]]=O

= E[c'H(x)'R(x)-E[~~,Ix]]
LEMMA A.4: E[m,nz;J is nonsi~~giilar < K.

and for arzy a with E[a2]

PROOF: By O(x)symmetric, R(x)-' is also a generalized inverse and
E[rn,rn:] = E[H(x)'R(x)-R(xjfl(x)-'H(x)l

By the first hypothesis of Assumption 2, E[H(x)'R(x)-H(x)]

is ~nvariant to the g-inverse, so by the second it is nonsingular, giving the first conclusion. By Lemma B.4 u,,= E[ap'(x]O(x)-phas finite mean square. Also,

E[p(a-a,)x] = E[palx]-f2(x)f2(x)-E[palxl=0

by Lemma B.3, so that a -a, E TI,, and for any t E TJp,

Therefore, proj(alT~,) = a -a,. Next, the formula for proj(aiTJH) is well known. Finally, the formula for proj(alTi) follows from the orthogonality result of Lemma A.3. Q.E.D.

Let + he as defined in equation (A.1).

LEMMAA.5: Forall J large rnenoug/~and +I =proj(\$l[Sp]@ TI). it 1sthecnsr that E[\$, =2,.

PROOF: It follows similarly to the proof of Lemma A.4 that E[rnpm:] =E[D(x)'R(x)-H(x)]. Therefore. by Lemmas A. 1and A.3.

where the last equality defines U.

Next, let R(x) =_E[jj'ix] for j from equation (A.1) and consider any J 27. Then for any t E TIp, E[\$t] =E[B(x)E[it\x]]=0. Hence, the components of \$ are in the orthogonal com- plement of T,,, so that proj(+lT~,) =0. It follows by Lemma A.4 that proj(+lT,) =proj(iljlTJH). Then, since U and m, are orthogonal, [U] \$ [my]= [(U', m;)'],and (CT', m;)' is a nonsingular linear combination of rn = (mi,mi)' =G(x)'R(x)-p. it follows by standard Hilbert space theory that

Let K he the selection matrlx so that j =Kp. Note that by construction, p(z, P, yj) does not de-

pend on the parameters in that are not In 8, so that KG(x) =dE[j(z, P,y,)l.~]/d0=[G(x), 01. Also, by Lemmas A.l and B.3, and Assumption 2, there is an F(x) such that G(x) =R(x)F(x). Therefore, for %= (E[A(x)G(x)I)-~,

= [I,, ~]Z'E[~(X)KG(~)] =[I,, 0]%[E[A(x)G(x)], 01 =[I,, 01.

Finally. noting that (E[rnrn3])-' is the asymptotic variance of 6, it follows from the last equation that

E[+,+;] =E[\$m'l(E[rnm'])-'E[m+'l = [I,, O](E[rnrnf]) '[I,, 01' =XI. Q.E.D.

PROOF OF THEOREM 1: By having a finite semiparametric bound and the spanning con- dition, Sp -proj(SplT,) has a nonsingular covariance matrix. By Lemma B.2, +, -+ \$* = proj(\$l[Sp] ta T,), so that by Lemma A.5, 2, =E[\$j-\$>]-+ E[\$*\$*']. Then. as discussed above, the spanning condition implies that E[\$*\$*'] is the semiparametric bound. Q.E. D.

PROOF OF THEOREM 2: For simplicity suppress the x argument in a(x), D(x), H(x),G(x), and R(x). Let T* = -D'R-tE[DfR-H](E[HgR-HI)-'H'R-.Note that by the usual parti- tioned inverse calculation and similarly to previous results in Appendix A,

Also, let K =E[pS'lx]. By Lemma A.l, D = -E[pSp 1x1, and by T c T,, E[p proj(Spl T)'lxl = HC for some matrix C, 50 that K =E[pSbIx] -E[p proj(SplTi'lx] = -D -HC. Therefore.

= ax-+C E[H'R-H](E[H'R-H])-'H'R~a'

C'H'R =

Note that E[n*H] =0.Also, for any a with E[nH] =0, since Lemma B.3 implies that there is F with K =RF and by Assumption 2, H =RR, for B =E[K'R-H](E[H'R-HI)-' we have

Then it follows that for any n with E[nH] =0.

Therefore,

so that for any +with E[+H] =0, since n =n*-+ also satisfies E[aH] =0.

The conclusion then follows from the last two equations and E[np(np)'] p.s.d. Q.E.D.

PROOFOF THEOREM 3: Consider any t E T,. Then t will satisfy E[ptJx] = Hc for every J. Then for every rn = n(x)p r E[rnt] = E[n(xjE[ptlx]] = E[7i(x)H]c = 0. Thus we have T, c M'. Now consider t EM'. For a symmetric p.s.d. 0-, let K = E[ptlx]. By Lemma B.4, E[{K'R-p]'] < x, so by Lemma B.3. E[K'R-K] = E[{K'R-p)'] and E[K'R-HI = E[{K'R-p]rn;] exist. Let c = (E[H'R-HI)-'E[H'f2-K]. and a = (K -Hc)',c2-. Note that E[nH] =E[K'R-H ] -c'E[H'f2-HI =0.Therefore n(x)p r M, implying

0 =E[n(~)pt]=E[7i(xjK] =E[(K -Hc)'R-K] =E[(K -Hc)'R-(K -Hc)],

where the last equality follows by E[n-HI =0. It follows that (K -Hc) R(K -Hc) =0. Since H =RR by Assumption 2 and K = RF by Lemma B.3, it follows that (F-Rc)'R(F -Rc) = 0. implying 0''(F -Rc) =0 for any square root matrix, implying

Therefore, t r T,,. Since this inclusion holds for any t E M' it follows that M' c T,, and hence that T,, =ML. To prove the second conclusion, note that M is linear and closed, so that M" =M by standard Hilbert space theory, and hence (T,,)- =M . Q.E.D.

PROOFOF THEOREM 4: Let If = 1(.z> y,") and (a;) be a countable basis of bounded func- tions of x. Then (l,'a~)~=,

is a basis (the proof is available upon request), meaning that if E[r(&, xj2] < m and E[lfa;r(c, x)] =0 for all j and k, then r(c,x) =0. Note that for v s -y andy=Owehavec=y*-us -v=y-v.Hence. l(v> -y)l(y-v < y)=l(vs -y)l(~< y). Therefore.

By v continuously distributed, it follows that E[p,(z, Po,y)x] is differentiable at y with proba- bility one (w.p.l), with derivative l(v s -y)g(y). Hence, H,(xj = l;'g(y,,) for lj = l(v -y,,,).

Consider t(y, x) in the moment tangent set. Since y is a function of x and E, we can regard t as a function of e and x. Then by H(x)diagonal and equation (7) there is a constant c,, with

Next, consider j with P, =E[l;]> 0, and let s,(&)=E[l; tI&]/P, =E[tis,If =11, and J(j) = (j': yj.~'Y,(~}.Note that for any j' E J(j)we have 1;l;. = 1;.Then replacing j by j' in the previous equation and multiplying through by 1; gives

l)'E[l;t(~, lfS, for all

x)lx]= j' E J(j).

Taking expectations of both srdes and dividing by P, gives Ell; s,] =E[1:If t]/P,=S, Let 1, = 1:l). Note that tor any i,l;l, = If 1; for some j' E J(jj. Then

Since this equality holds for all and k it follows that l,(t -s,) =0 w.p.1. Since this equality holds for all j, a countable number of these w.p.1, we have l,(t-s,) =0for all j with P, > 0. Next, consider any j and j' with y,, r y, and P, z 0. Then

Taking conditional expectations given E, by independence of x and E. P,l,",(s,-s,,) =0. implying

I;, (si -s,,) =0, so that s,(E) =s,,(E) w.p.1 for E > y,,. It then follows in a straightforward way that there is an S(E) such that l,(t -s) =0 for all j (details available upon request). Then, noting that by denseness of the quantiles, Uz,((8,x) : E > y,", v > -ylo) ={(E,x) : E+ u > 01, it follows that t(~,x)=S(E) for y > 0. Then, as noted in the text, it follows that t(~,

x) is an element of the model tangent set. Thus the spanning condition is satisfied. Q.E.D.

PROOFOF THEOREM5: Let S(y)=JYxg(~)d~ be the survivor function for g(~).By Ing(e) strictly concave and Pratt (1981)it follows that InS(y) is strictly concave. Then dlnS(y)/dy= -g(y)/S(y)is strictly decreasing, so that for any T > 0,

Then by limy-,[S(y +i)/S(y)]=0,it follows that for any i> 0 and 0 .:a < 1there is a unique solution y(a,7) to S(y + T)/S(y)= a, i.e., to E[ul(&> y) -I(& > y + :)I =0. By the im- plicit function theorem, r(a,T) = (y(a,T), T + y(a,7)) is continuous in (a,T) and has range r= ((y,lj: { > y}.It follows that ={(y,,, y,, +i,)]is dense in T.

Now let 111," =@,I(& > Y,~)-1(~ < w,and sup-

> 3/10+7,).Consider any r(e)with E*[~(E)'] pose that 0 =E*[m,"r]for all j. Then by continuity of the integral it follows that for all y < i,

Differentiating with respect to { holding y fixed we see that r(l)= I(F)~(E)d~/S(y)almost everywhere for all l > y. By repeated application of this equality for different y, we find that r(&)is constant almost everywhere. Then, if r = r(s,x) and a; is as in the proof of Theorem 4, by the Fubini theorem, E*[rn;a",] =0 implies Jr(t:, x)a'F dx) =ck for some

!

constant c~.Taking expectations of each of these equalities and subtracting, it follows that for J(x) =J Y(F.x)g(t:) dt: that SLY(&, -?(x)]a;0 for each k, which implies that

X) F(dx)=r(~, =Y(x).Thus, E'[rn;a;r] =0 for all j and k implies r(c,x) =?(x).

X)

Next we proceed as in the proof of Theorem 4 with notation as given there. It follows similarly to the last paragraph that if E*[l,m,,n;r]=0 for all j' E J(j) and all k, then r(&, x) =?(x)= fyT,r(&,x),~(E)ds/P, for all t: > y1,~and v > -yIo Also, it follows similarly to the proof of The-

orem 4 that E[p,(z,Po,y)~]

is differentiable at y with probability one (w.p.l), with derivative H,(x) =lyd,~s(u)where d, =-a,,g(yil,) +g(ylo+7,). Then, multiplying through by S(U),the moment tangents must satisfy

It then follows analogously to Theorem 4 that E*[l,n~,'a;(t-s,)]=O for all 1' 6J(j).Therefore. t(f,X) +t)g(~)dt:/S(y,,~)

for r(&,X) =-J{(c)and the rl(.r)=Jy; t(~, we find that

for all j with P, > 0. Then: because r is additive in a function of E and x for 1, =1 it follows in a straightforward way, similarly to the proof of Theorem 4, that there is s(t:) and r(x)with l,(t -s -r)=O (details available upon request). Then t(t:, x) =s(&)+t(x)for all y >0. so that t is an element of the model tangent set, and the spanning condition is satisfied. Q.E.D.

The first result shows the formula for the first-order conditions of the ordered choice MLE in the transformation model. Let Y,,=1(jl:,I 5y, v, jI)and P,,(0)=P,(x,. 0). Differentiating the log-likelihood gives first-order conditions

where the second and third equalities follow by C::: P,,(O)=1 identically in 0 and the fourth equalityby Y,-P,l(H)=p,(z,O)-p,l(z,O), j=2 :...,J, and Y,l -Pll(0)=pl(z,O). The next result shows that a sequence of positive semidefinite (p.s.d.) matrices that is monotonically decreasing in the p.s.d. semiorder has a limit.

LEMMAB.1: If 2,ispositive sernidejhite nnd ZJ 1ZJ+,for each J. then lim~,, I,exists.

PROOF: Let tr(M) denote the trace of a square matrix M. By .XJ -2J_lp.s.d., tr(IJ) tr(2J,l) =tr(irJ -IJ+,)

10, so tr(IJ) is a nonnegative, monotonic decreasing sequence, and hence converges. Therefore, ti-(&) is a Cauchy sequence, implying tr(ZJ -2,) +0 as J +x and K -t x,K 1J. Let M(l =J'm.For M p.s.d., M =B'.1B for an orthonormal ma- trix B and a diagonal matrix of nonnegative eigenvalues ll,so that by A,, >0

-

Applying this equation with M =.ZJ -,CK, we obtain (I& -2~11~ -2~)'.Because

< tr(& .ZK11 is just the usual Euclidean norm, it follows that each element of ZJ is a Cauchy sequence, and hence converges. Q.E.D.

The next result is useful for the limiting arguments. For a vector h = (hl,. . . ,h,) of elements of a Hilbert space P let [h] = (a'h : a E 8")denote the linear span of h. Also, let \$ denote the direct sum of two linear subspaces, i.e., M \$N = [rn +n : rn E M, n E N]. Also, for a closed linear subspace L let proj(h1L) denote the vector of orthogonal projections of the elements of h on L, satisfying proj(h,lL) E L and (h,- proj(h,lL)lt) =0 for all t E L.

LEMMA B.2: If L,, Lz, . . . is a sequence of closed linear subsets of a Hilbei? space P and h a a vector of elements of P such that rn = h -proj(h1 L,) has a corresponding nonsingular ma- trix Q = [(rnkIml)~,l=,]

of innerproducts, then for any a E P,we have proj(ai[h] @ n:=,L,) + proj(al[hl\$ nz,L,) as J + oc.

PROOF: Denote LJ = L,, LX.= nz,L,, and rnj = h -proj(hlL9). By Lemma 4.5 of Hansen and Sargent (1991), rn~ + rn = h -proj(hlLX). Then QJ = [(rnJklrn,,)~,,=,]-+ Q, so that Qj is nonsingular for large enough J. Therefore,

Then by orthogonality of [mJ] and Lbnd orthogonality of [m] and L", standard Hilbert space theory and Lemma 4.5 of Hansen and Sargent (1991) gives

proj(al[hl \$ L') =proj(al[m~] @ L') =proj(al[mJ]) + proj(a1~~)

+ proj(aI[rnl) + proj(alLx) =proj(al[m] @ L")

= proj(a[h] @ L"). Q.E.D.

The following result will hold with F(x) = R(x)-'E[aplx] when O(x) is nonsingular, but re- quires a proof for R(x) singular.

LEMMA B.3: For any a with E[a2]finite. there exists F(x) such that E[aplx] =O(x)F(x).

PROOF: Consider J x 1random vector 6(x) such that 11 6(x)115 1 and R(x)S(x) =0 with prob- ability one. Then E[{p'S(x)J2] exists, so that so does E[{p'S(x))"x] = 6(x)'R(x)S(x)= 0. Then by iterated expectations, E[{p'S(x)]'] = 0, and hence p'S(x) = 0. It follows that E[ap'Jx]S(x) = E[ap'S(x)lx] = 0. Since this equality holds for any such 6(x), it follows that with probability one E[ap'lx] is orthogonal to the null space for R(x). By symmetry of R(x), its range and the null space are orthogonal subspaces of !XJ, so that XJ is the direct sum of the range and null space. Consequently, E[apx] must be in the range of O(x). Q.E.D.

LEMMA B.4: For any generalized inverse R(x)- and any a with E[a2] finite, E[[E[up'lx] x R(x)-p]'] isfinite.

PROOF: Since n(x) is symmetric, R(x)-' is also a generalized inverse. By Lemma B.3 there is F(x) such that E[ap'lx] = O(x)F(x), so that for i = E[ap'lx]R(x)-p,

is invariant to the generalized inverse. Let ct(x) denote a diagonal matrix of eigenvalues of n(x) and B(x) an orthonormal matrix with n(x) = B(x)'.?(x)B(x). Let Al(x)-"2denote the matrix with diagonal elements equal to the inverse square root of corresponding nonzero elements of ,A(x) and zeros where .I(x) is zero, and L(x) = B(x)'A(x)-"'Bix). Then L(s)' is a generalized inverse, and by the Cauchy-Schwarz inequality,

Then, from taking expectations, of this equation and the previous one, we obtain E[C2]5 JE[a2]< XI. Q.E.D.

REFERENCES

ANDREWS.

D. W. K. (1991): "Asymptotic Normality of Series Estimators for Nonpararnetric and Semiparametric Models," Econornetrica, 59, 307-345. BICKEL, P. J., C. A. J. KLAASSEN,Y. RITOV,AND J. A. WELLNER (1993): Eflcienr and Adaptive Estimation for Semiparametric Models. Baltimore: Johns Hopkins University Press. CHAMBERLAIN,G. (1987): "Asymptotic Efficiency in Estimation with Conditional Moment Re- strictions." Journal of Econometrics, 34, 305-334. COSSLETT, S. R. (1987): "Efficiency Bounds for Distribution Free Estimators of the Binary Choice and Censored Regression Models," Econornetrica, 55, 559-585. -(2004): "Efficient Semiparametric Estimation of Censored and Truncated Regressions via a Smoothed Self-consistency Equation," Econornetrica, 72, 1277-1293. DONALD, S. G., G. IMBENS,AND W. K. NEWEY (2003): "Empirical Likelihood Estimation and Consistent Tests with Conditional Moment Restrictions," Journal of Econometrics, 117, 55-93. HALL,l?, AND J. L. HOROWITZ (1990): "Bandwidth Selection in Semiparametric Estimation of Censored Linear Regression," Ecorzonzetric Theov. 6, 123-150. HAHN,J. (1997): "Efficient Estimation of Panel Data Models with Sequential Moment Restric- tions, Journal of Econometrics, 79, 1-21. HAN, A,, AND J. A. HAUSMAN(1990): "Flexible Parametric Estimation of Duration and Compet- ing Risks Models," Journal ofApplied Econometrics, 5, 1-28. HANSEN,L. l? (1982): "Large Sample Properties of Generalized Method of Moment Estimators." Econornetrica. 50, 1029-1054. HANSEN,L. P., AND T. J. SARGENT (1991): Ratiorzal Expectatiorzs Econometrics. San Francisco: Westview Press. ICHIMURA,H. (1993): "Semiparametric Least Squares (SLS) and Weighted SLS Estimation of Single-Index Models," Jo~lrnal of Econometrics, 58, 71-120 IMBENS,G. W. (1997): "One-Step Estimators for Over-Identified Generalized Method of Mo- ments Models," Review of Econornic Studies, 64, 359-383. KOENKER. R., AND J. A. F. MACHADO (1999): "GMM Inference when the Number of Moment Conditions Is Large," Journal of Econometrics. 93,327-344.

NEWEY,W. K. (1987): "Interval Moment Estimation of the Truncated Regression Model," Mimeo, Department of Economics, MIT; presented at the 1987 Summer Meeting of the Econo- metric Society.

(1988): "Adaptive Estimation of Regression Models via Moment Restrictions." Jounral of Econometrics, 38, 301-339. (1993): "Efficient Estimation of Models with Conditional Moment Restrictions," in Handbook of Statistics, Volume 11: Ecorzometrics, ed. by G. S. Maddala. C. R. Rao, and

H. D. Vinod. Amsterdam: North-Holland.

(2001): "Conditional Moment Restrictions in Censored and Truncated Regression Mod- els." Econometric Theory, 17, 863-888.

NEWEY,W. K., AND D. MCFADDEN(1994): "Large Sample Estimation and Hypothesis Testing," in Handbook of Econometrics, Vol. IV, ed. by R. F. Engle and D. L. McFadden, Chapter 36. NEWEY,W. K., AND R. J. SMITH (2004): "Higher Order Properties of GMM and Generalized Empirical Likelihood Estimators," Econometrica, 72, 219-255. POWELL,J. L. (1986): "Censored Regression Quantiles," Journal of Econometrics, 32, 143-155. -(1994): "Estimation of Semiparametric Models," in Handbook of Econometrics, Vol. IV, ed. by R. F. Engle and D. L. McFadden, Chapter 41. PRAT, J. W. (1981): "Concavity of the Log-Likelihood," Journal of the American StatisticalAsso- ciation, 76, 103-106. QIN, J., AND J. LAWLESS(1994): "Empirical Likelihood and General Estimating Equations." The Annals of Statistzcs, 22, 300-325. RITOY Y. (1990): "Estimation in a Linear Regression Model with Censored Data," The Annals of Statistics, 18, 303-328.