Panel Binary Variables and Sufficiency: Generalizing Conditional Logit

by Thierry Magnac
Citation
Title:
Panel Binary Variables and Sufficiency: Generalizing Conditional Logit
Author:
Thierry Magnac
Year: 
2004
Publication: 
Econometrica
Volume: 
72
Issue: 
6
Start Page: 
1859
End Page: 
1876
Publisher: 
Language: 
English
URL: 
Select license: 
Select License
DOI: 
PMID: 
ISSN: 
Abstract:

PANEL BINARY VARIABLES AND SUFFICIENCY:
GENERALIZING CONDITIONAL LOGIT

This paper extends the conditional logit approach (Rasch, Andersen, Chamberlain) used in panel data models of binary variables with correlated hed effects and strictly exogenous regressors. In a two-period two-state model, necessary and sufficient condi- tions on the joint distribution function of the individual-and-period specific shocks are given such that the sum of individual binary variables across time is a sufficient statistic for the individual effect. By extending a result of Chamberlain, it is shown that root-n consistent regular estimators can be constructed in panel binary models if and only if the property of sufficiency holds. In applied work, the estimation method amounts to quasi-differencing the binary variables as if they were continuous variables and trans- forming a panel data model into a cross-section model. Semiparametric approaches can then be readily applied.

KEYWORDS: Binary models, panel data, conditional logit, sufficiency, quasi- differencing, semiparametric methods.

1. INTRODUCTION

THE ELEMENTARY BRICK of panel binary models with correlated fixed effects (see Arellano and Honore (2001) for a recent survey) is a two-period two-state model. The pair of individual binary variables is described by a pair of latent variables that are as- sumed to be the sum of a linear index of explanatory variables, of the individual effect, and of the individual-and-period "specific shocks." The parameter P of the index is the parameter of interest. Estimating it by conditional logit is a well-known semiparamet- ric technique since it avoids specifying the distribution of individual effects conditional on covariates (Rasch (1960), Andersen (1973), Chamberlain (1984)). Its properties stem from the existence of a sufficient statistic for the individual effect which is the individual sum of the binary variables. By definition of sufficiency (or more precisely S-sufficiency; Barndorff-Nielsen (1978)), the conditional likelihood function depends on the parameter of interest only while the marginal likelihood function depends on the parameter of interest and the nuisance parameter, the individual effect (Lancaster (2000)). Conditional logit is nevertheless seen to be restrictive because of the distribu- tional assumptions. Yet, an intriguing and important result related to conditional logit was shown by Chamberlain (1992). If individual-and-period specific shocks are inde- pendent over time and if covariates are unbounded, consistent estimation at a fi-rate of the parameter of interest is possible if and only if the distribution of individual-and- period specific shocks is logistic. The semiparametric efficiency bound is equal to zero in all other cases.

In this paper, I show that the sum of the binary variables is a sufficient statistic for the individual effect, under necessary and sufficient conditions that are much less re- strictive than in the conditional logit approach. Moreover, the result of Chamberlain

'I thank Andrew Chesher, Miguel Delgado, Jean-Pierre Florens, two anonymous referees, the coeditor, and participants at seminars at CREST, Toulouse, Tinbergen, UC London, Bonn, CIRANO, Brown, and Quebec, and conferences EC2 in Dublin, ESEM in Lausanne, Panel Data in Berlin, and CAM in Copenhagen, for helpful comments. The usual disclaimer applies.

(1992) is generalized. If covariates are unbounded, then consistent estimation at a &-rate is possible if and only if the sum of the binary variables is a sufficient statistic. The property of sufficiency thus characterizes all models (when variables are strictly exogenous) for which it is possible to construct &-consistent estimators. The strict exogeneity assumption can be partially relaxed. Conditional logit is a special case. The only joint distribution function such that (i) the individual-and-period specific shocks are independent, and (ii) the sum of the binary variables is a sufficient statistic, is the logistic distribution.

Furthermore, it is shown that imposing the property of sufficiency reduces the di- mensionality of the model from the unknown bivariate density function of individual- and-period specific shocks into two univariate density functions satisfying additional tail conditions. The property of sufficiency implies that shocks are exchangeable and this setting is nested into the setting of Manski (1987). The reduction of dimensionality explains why fi-consistency holds. Furthermore, maximizing conditional likelihood is shown to be equivalent to using an estimating equation that relates the expectation of the difference between the pair of binary variables and the difference in the linear index of covariates, in the sample where binary variables differ across time. It thus amounts to quasi-differencing the binary variables and turning the panel estimation problem into a cross-section one. Semiparametric techniques can then be readily applied.

Section 2 presents the set-up (Assumptions R), defines the property of sufficiency (Condition S), and proves the extension of Chamberlain (1992) about &-consistent estimation. Section 3 is the main theoretical section where joint distribution functions that satisfy sufficiency are characterized as functions of two univariate distribution functions. Necessary and sufficient conditions on these functions (Properties P) are given. We study estimation and give a parametric illustration in Section 4. Section 5 proposes extensions and concludes. All proofs are in appendices.

2. REGULARITY CONDITIONS, SUFFICIENCY, AND ROOT-N CONSISTENCY

Consider y,, yz, two binary variables, and zl, z2, two L-dimensional covariates2 The model is:

Fort=1,2: y,=l ifandonlyif z,P+E+u,>O.

We adopt some regularity assumptions:

ASSUMPTIONR1: (i) The dlfSerence across time of the first covariate (say), zj ' -zil', continuously varies over the whole real line R (for almost all values of other covariates) and the coejjicient of the first covariate, p'", is equal to one.

(ii) The support of zl -z2is not contained in any proper linear subspace of WL.

(iii) Random shocks (ul ,u2) have a strictly positive, continuous, and bounded density function with respect to the Lebesgue measure and are independent of zl,zl, and E.

(iv) The individual effect E continuously varies over the whole real line R (for almost all values of covariates).

'We only consider random samples and we do not subscript individual observations by i.

Assumptions Rl(i) and Rl(ii) are sufficient identification restrictions borrowed from Manski (1987, Assumption 2, p. 358). In contrast to Manski (1987), random shocks are supposed to be strictly exogenous and independent of all variables by Assump- tion Rl(iii), though we discuss how to weaken this assumption in the last section. Assumption Rl(iii) also limits the discussion to the class of sufficiently smooth dis- tribution functions in order to use the usual tools of differential calculus. Assump- tion Rl(iv) assumes complete variation of all probabilities in the "between-individual" dimension. It is similar to Assumption Rl(i), which assumes complete variation in the "within-individual" dimension.

For conditional logit, specific shocks u, and u2 are furthermore assumed to be in- dependent and logistically distributed. It implies that the sum of binary variables is S-sufficient--or sufficient from now on-for the incidental parameter, E (Barndorff-Nielsen (1978)); that is, for K =0, 1, and 2:

While it trivially holds when K = 0 or 2, this expression yields the conditional likeli- hood of an observation such that (y,, y2) E ((0, I), (1,O)) when K = 1. The conditional likelihood function does not depend on the incidental parameter and the maximum conditional likelihood estimator is &-consistent.

This approach has been criticized on the ground that the independence and lo- gistic assumptions are overly strong. Observe that these assumptions seem indeed to be overly strong to derive &-consistency since sufficiency alone implies that the conditional likelihood function is independent of the incidental parameter. Our first motivation is thus to prove that using sufficiency substantially generalizes conditional logit. Rewrite property (2.1) as Condition S(ufficiency), prominent in the rest of the paper:

LEMMA1: UnderAssumption R1, the sum of binary variables is sufJicient for the indi- vidual effect, E, ifand only ifthere exists a real@nction c(.) such that:

Pr(u1 > Xl, U2 C ~2)=C(XI -*.*).

(Condition S) V (xl,x2)E El2, Pr(u1 iXl, U2 > ~2)

Another justification for using and characterizing sufficiency is that it is the weakest condition that one can find in the present set-up to construct &-consistent estimators. In an unpublished paper, Chamberlain (1992) proved that if ul and u2 are independent, the semiparametric efficiency bound of parameter P is equal to zero unless the distri- bution function of random shocks is logistic. It therefore tells us that &-consistent estimators can only be constructed under the logistic assumption. By dropping the in- dependence assumption, Chamberlain's result3 is extended into the following theorem:

3This theorem also leads to proof of the conjecture that the sum of binary variables is the only candidate for a sufficient statistic. Suppose there exists a statistic to which the principle of sufficiency, with respect to individual effects, could be applied. Then, using conditional likelihood methods, it would be possible to construct a &-consistent estimator by conditioning on this statistic. By this theorem, this sufficient statistic is the sum of binary variables across time.

THEOREM1: UnderAssumption R1, the semiparametric efficiency bound for ,B is equal to zero unless the sum of binary variables is sufficient for the individual effect.

3. CHARACTERIZATION OF SUFFICIENCY

Condition S is easily used in estimation (see Section 4). It is however by no means obvious that function c(.)is unconstrained and can be set as wanted. It is the purpose of this section to derive these conditions. We first derive the general implications of the property of sufficiency on the joint distribution function of (u,,~1~).

We then prove that function c(.) is in a one-to-one relationship with the distribution function of the difference ul -u2. Necessary and sufficient conditions for sufficiency are then pro- vided. We conclude by investigating the consequence of the additional assumption of independence between u and u2.

By including two time-dummies among the explanatory variables, we can always adopt the following normalizations:

ASSUMPTION where F( .) is the marginal distributionfclnction of u, .

R2: (i) F(0) =

(ii) c(0)= 1.

First, we derive some necessary conditions for Condition S to hold and characterize the expression of the joint distribution function of (u,, u2).

THEOREM 2: Assume thatAssumptions R1, R2, and Condition S hold. Then:

    1. The function c(h) is strictly decreasing from +cx; to 0 and is twice continuously dif- ferentiable.
      1. The marginal d.f. of u2 is equal to the marginal d.f. of ul:
      2. The joint d.f. of (u, , u2) is given by
  1. F(.) is three-times continuously differentiable and f" is bounded where f is the den- sity functioiz of 14,.

Claim 1 is directly derived from the limit conditions in Condition S. In claim 2, the identity of marginal distributions of ul and u2 is reminiscent of the property of exchangeability at the heart of the score method developed by Manski (1987). This property is here shown to be the consequence of the sufficiency property, which is therefore stronger than exchangeability. Claim 3 of Theorem 2 gives a characteriza- tion of the joint probability function in terms of two functions F(.) and c(.)only. The sufficiency property thus reduces the dimensionality of the problem, at least, from a

In Chamberlain (1992) there is another result about identification when regressors are bounded. It uses a very similar technique of proof. It is a conjecture that a generalization of that result also holds.

function of two real arguments to two functions of one real argument. We show in the Appendix how to define by continuity the joint probability distribution when x2 =XI. It is easier to continue to work with the distribution function of the difference, ul -uz,which is in a one-to-one mapping with function c(.).

PROPOSITION1: Let 4(h) (resp. cp(h)) be the distribution (resp. density) function of ul -u2. Under Assumptions R1-R2 and Condition S, we necessarily have two con- ditions:

r+x r-x

lhcp(h)l

(P2) 3Po > 0, lim > Po. /l++l-x rcp(r) d~

Function c(.) and distribution +(.) are in the following one-to-one relationship:

The first condition tells us that the expectation of (ul -u2)when ul -u22 0 is finite and makes use of the fact that ul and u2are identically distributed. Thus, E(ul -1d2) =0 even when Eul does not exist. These two regularity conditions P1 and P2 are verified if the distribution of ul -uz is thin tailed and not too hectic at infinity, for instance, if 4(.)is the normal d.f. They are not if the distribution is Cauchy for instance.

In contrast, the marginal distribution F(.) should have thick tails:

PROPOSITION2: UnderAssumptions R1-R2 and Condition S, we necessarily have:

f"(x)

(P3) 3 a" > 0 such that Vx, -< (aol2,

f (x)

where a. depends on distribution function 4(.).The set of such distributions is not empty.

The marginal density function f should not be "too" convex and the normal dis- tribution, say, would not qualify for this condition. In contrast, mixtures of normal distributions through a gamma-distributed precision parameter verify this condition as presented in the next section.

We can now prove that the sufficiency property can be used for a much broader set of joint distributions than the logistic.

THEOREM3: Assume Assumptions Rl(i), Rl(ii), and Rl(iv). Let D be the set ofpairs (cp, f) of strictlypositive, continuous, and bounded densityfunctions on R, such that f is twice continuously differentiable and f" is bounded, and such that Assumption R2 and conditions PI, P2, and P3 hold. Parameter a. in condition P3 is defined in theproof.

Then, for any (cp, f) E D, (3.1) and (3.2) define a joint distribution function that verifies Condition S and Assumption Rl(iii).

As said, the sufficiency property can also be interpreted as reducing the dimension- ality from a set of bivariate density functions to a set of pairs of univariate density functions (cp, f). This reduction explains why the fi-consistency result can hold. The theorem gives additional restrictions PI, P2, and P3 though these conditions do not affect dimensionality. An open question is how restrictive they are for empirical work on top of the dimensionality reduction.

Observe also that the reduction of dimensionality achieved by assuming indepen- dence between u, and u2 is of the same magnitude since the joint density function is then given by two univariate marginal density functions. Using both dimensionality re- ductions at the same time is very restrictive, however, since it yields only one parametric family, the logistic distribution.

COROLLARY are

1: Assume Assumptions R1-R2, Condition S,AND that u, and LI? independent. Then, ul and u2 are logistically distributed. Formally, there exists k > 0 such that:

4. SEMIPARAMETRIC ESTIMATION AND A PARAMETRIC EXAMPLE

Under Condition S, the conditional probability that can be used as the estimating equation, can be written as (see equation (A.l) in Appendix A):

Observe first that the model is a cross-section binary model. The "transformed" dependent variable can only take two values, "Entry" (yl =0, y2 = 1, Ay = +1) or "Exit" (yl = 1, yz =0, Ay = -I), in the sample of movers (c:=,=

y, 1). As the index (zl -z2)Pis linear in P, it is in this sense that the sufficiency property permits "quasi- differencing" the data. Semiparametric identification is achieved as in Manski (1988) or Horowitz (1998) using Assumption Rl(i),(ii). Provided that Assumption R2(ii) and conditions P1 and P2 are verified,%quation (4.1) describes a monotonic single-index model, for which estimation methods are described, for instance, in Horowitz (1998). Besides, we can extend this estimation principle to T periods with T > 2, by borrow- ing the idea of Manski (1987). First-difference binary variables between two periods in sequence. Then write a pseudo-likelihood function as the sum of likelihood functions given by equation (4.1) for the (T -1) pairs.

Parametric methods are less attractive than semiparametric methods since assuming a parametric distribution different from the logistic implicitly assumes away indepen- dence. It is nevertheless interesting to investigate special cases to understand the con- sequences of imposing sufficiency on distribution functions. The simplest parametric example is the known case of logistic distributions when function c(h)=exp(-h) and

4Normalization of Assumption R2(ii) translates into G(0) = 1/2, condition PI into G' is positive and bounded, and condition P2 into conditions on the tails of G(.).These conditions are derived using equation (3.2).

the distribution function given by (4.1) is the logistic distribution. There are two routes to depart from this assumption. The first route is to use popular distributions in (4.1), for instance the normal d.f. It however seems to generate quite implausible distribution functions for the difference between ul and u2.' The second route is to specify the distribution function of the difference between ul and u2. We now briefly look at that case, when ul -u2is normally distributed with zero-mean and variance equal to uo, say. The density function, p,(.), is symmetric around 0 and h:" rpO(r)dr = uOpO(h) is finite when h =0 (condition PI). Condition P2 is satisfied since

Looking for compatible marginals, the convexity condition P3 shall be satisfied for any marginals such that f"/f < (ao)2= 6/uo,as can be proven from the proof of Theorem 3. Consider a mixture of zero-mean normal variates where the precision is gamma-distributed with parameter 6 and A. Then, the density function is6

and

fl'(x)6 + 112

max -(6+ 1)'.

f(x) A(26 +3)

Choose A and 6 such that this maximum is less than (ao)2and consider the following model (where i now is an individual index):

where (ti,5;)are two independent zero-mean unit-variance normal variates and 1/u: is gamma-distributed with parameter 6 and A. Then, condition P3 is verified because the convolution of a distribution that verifies P3 (i.e. ulti)with any distribution verifies P3.' Besides, ul, and uzi are identically distributed because 5: is symmetrically distributed. In empirical applications, model (4.2) can be used when there is "a lot of heterogeneity" in the levels of the shocks and less in the difference.

'This and other examples are studied in the working paper, Magnac (2002), where bounds on correlation coefficients between ul and u, are also derived. In these examples, bounds are not limiting.

6Theseresults are shown in an Appendix available upon request or on my web page.

'Observe that the original model is not unique since any random variable can be added to (u,+ u2)/2and substracted from the individual effect E. All results are invariant to these renormalizations.

5. DISCUSSION AND EXTENSIONS

In this paper. we used the principle of sufficiency and conditional inference to derive a generalization of conditional logit. We presented the conditions under which we can quasi-difference binary data as if they were continuous. Cross-section semiparametric procedures can be used to estimate these models with unrestricted individual effects and their results can be compared with those that are obtained using random effect specifications. By extending a result by Chamberlain, we also showed that it is under the property of sufficiency only that we can construct &-consistent estimators in the panel binary choice model when regressors are exogenous.

There are some straightforward extensions. The first extension is that the linear in- dex property, writing latent variables as z,P, is far from necessary. Deterministic parts of latent variables in each period could be written as f;(z,, P,) and the conditional model would become a function of the difference between these nonlinear indices pro- vided that the latent models remain additive in the individual effect. Functions f,could even be partially unknown if the conditions of Matzkin (1992) are fulfilled.

A more involved extension is to permit the distribution function of specific shocks (u!,u.) to depend on (zl, z?). As the present model is nested into the flexible setting of Manski (1987). we could get closer to it. It does not seem to be possible however to let the joint distribution function depend on covariates in a completely unspeci- fied way. The reason is that in our proofs. we have to use the continuous variation with respect to the individual effect, E, and with respect to the difference, .xi -x2 (see Assumption R1). The closest we can easily get is to make the joint distribution of (u,,ii?) depend on all covariates except the differencc between the first continuous covariatc (z;" -zi"). We can then repeat the present analysis, conditional on values of zo = (z,1) + z!", zil', ii-l') where 2;-". zi-" include all covariates except the first. All assumptions are written conditional on w and all results apply, conditional on w. It might be reminiscent of the assumption used by Lewbel (2000) and Honore and Lewbel (2002). Covariate (z/" -zi") is the special regressor and equation (4.1) is not single-indexed any longer.

When data are observed over a longer time period (T > 2). periods can be chained two-by-two to construct pairs as already said. On the one hand, the question of which pseudo-likelihood for the (T-1)pairs to consider is worth investigating and this ques- tion is left for future research. On the other hand, using the property of sufficiency for triples of binary variables. foursome, or more. instead of pairs as we did here, is not an interesting extension. Some tedious investigation revealed that the only possible dis- tribution function that verifies the property of sufficiency in the case of triples, is the logistic d.f. The sketch of the proof is the following. For triples, the relative probabil- ities depicted by the analog of Condition S of exchangeable choices between any two latent elements should not depend on the level of the third latent element of the triple because this variable contains the individual effect. The Independence of Irrelevant Alternatives property comes in and drives us back to the logistic distribution.

Other lines of research seem more challenging. It remains to be seen how such an approach would be applied to other nonlinear models. It might be easier to extend this approach in models where we know that the principle of sufficiency can be applied (Weibull. Poisson,. ..). It seems to be a lot more difficult in dynamic models (Honor6 and Kyriazidou (2000)) and even more difficult in other models such as Tobit-like models.

INRA,Paris-Jourdan and CRESTINSEE,48, Bld Jourdan, 75014 Pans, France; tmagnac@ens.fr; www.inra.fuiESR/UR/lea/magnac.htm. 

Manuscript received June, 2002;final recision received October; 2003

APPENDICES

Equations or lemmas starting with A to D refer to the section of the Appendix where they are stated. Conditions starting with R, S. and P and numerals refer to the text.

APPENDIX A: PROOFS IN SECTION2

PROOFOF LEMMA1: Denote x, = -z,P -E and use Assumption Rl(iii) to write

By definition, sufficiency is satisfied if and only if, for (y,,j2)E ((0,I), (1.0)). equation (2.1) is verified. If it is verified for one of these pairs, say (1, O), it is satisfied for the other pair since probabilities sum to one. Thus, sufficiency is equivalent to the property that

is independent of c,because condition (2.1) says that

is independent of s.

By Assumption Rl(iii), r(xl,x2)is a smooth function from 8" to Rand x, and xz vary over the whole real line by Assumptions Rl(i) and Rl(iir). Thus, r(xl,I?)is independent of s if and only if it depends on the only combination of (.xi.x2) that does not depend on s, that is the difference x, -x2. Thus, there exists a real function c(.)such that r(xl,x:) = c(xl -n2). Reciprocally, if such an expression holds, sufficiency holds and

PROOFOF THEOREM 1: We adapt the proof of Chamberlain (1992, Theorem 2, p. 7). Define first the vector of probabilities:

LEMMA A.l: The semiparametric efficiency hound I.i= 0for all /3 in O unless tlze riistribution furzctiorz of random shocks is suclt that

(A.2) Vzlr z2 3 4 = d~~.51r4) E: X4 such that

14~.
Vs EE, $r1a(z.s, p) =0.

PROOF: See Chamberlain (1992). marginally adapting the proof to the case where u, and u2 are not independent. Q.E. D.

Second, fix z1 and z!. By Assumption Rl(iv), E continuously varies over X.Observe that if E -+ +m we must have 4b4 =0 and that if E + -oo we must have GI =0 (Chamberlain (1992)). Therefore equation (A.2) in the lemma above is equivalent to

which is equivalent to

where this ratio is independent of E. It is equation (2.1) and, by equivalence, Condition S stated in Lemma 1. Reciprocally, if Condition S holds, then the semiparametric efficiency bound I, # 0 since the conditional likelihood estimator is fi-consistent. Q.E.D.

APPENDIX B: PROOF OF THEOREM 2

Claim 1 is proven by using monotonicity properties of probability functions, limit conditions, and Assumption Rl(iii). We now prove claims 2 and 3. The proof proceeds by reparameterizing the problem as

A+h A-h

X, = -' x2=

2'

and by observing that by Assumptions Rl(i) and Rl(iv), pairs (A, h) span all @. Let

and

Using these expressions and Condition S,

K(h,3)= c(h)G(h, A):

we can write

By normalization, c(0)= 1, and thus

and Using equation (B.l):

which is equation (3.1) in the text. Observe that equation (3.1) tends to 010 when h =xl -x2-t 0. As the numerator and de- nominator are continuously differentiable, we can do Taylor expansions around points xl =x2:

where f (xl) > 0 (Assumption Rl(iii)), c'(0)<0 (claim 1 of this theorem), and

lim 0(x2-xl)/(xZ-xI)=0.

xl-rz+O

By taking limits of the numerator and denominator of equation (3.1) divided by (xl -x2),we have

As the joint density function is continuous and bounded, we can continuously differentiate this last equation two times and the result is continuous and bounded. Therefore, F(.)necessarily is three-times continuously differentiable and f"is bounded (claim 4).

APPENDIX C: PROOF OF PROPOSITION 1

The proof of this proposition proceeds in various steps. We first exhibit a condition of sym- metry, which substantially simplifies proofs below. We then derive the joint density function of (u,, u2) and the distribution of ul -u2.We finally derive the expression of function c(.) and conditions P1 and P2 stated in the proposition.

A TECHNICALSIMPLIFICATION:EXPLOITINGTHE SYMMETRY OF THE PROBLEM: There is a fundamental symmetry in the problem with respect to the disturbances u, and u2.Symmetry is a direct consequence of Condition S. If we change ul into u2and u2into ul,we change c(h)into l/c(h).By Theorem 2,we also know that the marginal distributions of ul and u2are identical and equal to F(.).We can therefore limit the proofs below to the case, h >0,provided that we verify the conditions bearing on the straight representation c(h)and on the reverse representation l/c(h).This property is summarized quite informally by the following lemma.

LEMMA C.1: If Condition S holds and ifconditions for c(h) and l/c(h) hold for any h >0, they globally hold for c(h).

THE JOINT DENSITY FUNCTION: The joint density function is derived by noting that Assump- tion Rl(iii) allows for differentiating two times equation (3.1). The second cross-derivative, or density function, denoted g(x,, x2) is strictly positive by Assumption Rl(iii) and equal to

where

is a negative function since c(.) is decreasing. It has a singularity point at h =0.By continuity, we nevertheless can obtain g(xl, xl) (see Proof of Proposition 2).

THEDISTRIBUTIONOF u, zc:: Observe that symmetry (Lemma C.l) can be used. Inter-

-

changing ul and u2 transforms the distribution of u, -u, into the distribution of the opposite,

11.1 -Ill. Consider then h > O and use equation (C.l) to write Pr(ul -LL: > h)as

Setting r =xl -.u2, we get

Observe that

because of equation (C.2), because lim,,,,,(l/(l- c(11)))= 1 (Theorem 2), and because liml,,,, s(h)=O (see definition (C.2), namely c(.) is decreasing and tending to 0 when h tends to zero so that c' tends to 0).

Using also

we get

-I--

-1 s(h) +h)-F(x)] dx.

1 -~(h)-m

Given that all functions in this expression are well defined,

so that differentiation with respect to h and integration can be permuted:

As the integral takes value 0 when h =0, we obtain

Replacing the integral in the expression of Pr(ul -uz > h)yields

Denoting $(h)=Pr(ul -u2 5 h),

using equation (C.2). Symmetry (Lemma '2.1) can be checked and this formula applies to h < 0 and by continuity to h =0.

CONDITIONS DISTRIBUTIONFUNCTION$(.) (Conditions P1 and PI): We

ON THE now seek c(.)as a function of $(.) and derive necessary conditions on &(.). First. integrate equa- tion (C.3) to get

where A is a constant of integration to be found. As h/(l -c)is equal to -(l/c'(O))> 0 when h =0. A is equal to -(l/cl(0))> 0.The following lemma determines A.

~jherr q(h) =@(h) is the density fi~nction of ul -111.

PROOF:Let h > 0and consider the joint density function given by equation (C.1): s(h)(f(x+h)+f (XI)+sl(h)(F(x+h)-F(x))> 0, where s(h)is a negative function. Use equation (C.3) to write

and integrate by parts the integral in equation (C.4),

to get

As s(h)< 0,hh~q(7)

d~ is bounded by A. Thus

(c.6) lim hp(h)=0.

h++.m

Rewriting equation (C.l)using s(h)<0 and F(x+h)-F(x)> 0for any h > 0,

Replacing s(h)by its expression (C.5)implies that fix+h)+f(x) < 2+ hdh)

Vh> 0, Vx,

F(x+h) -F(x) h A -&' Tq(r) d7' Taking limits when h +rn yields

Therefore

hdh)

(C.7) 3/30> 0. A -A. > PO.

v(7)d7

Because of equation (C.6),the numerator tends to zero and the limit of the denominator is thus necessarily equal to zero. Q.E.D.

Replacing A by its expression in equation (C.4) and solving for c, yields

To finish the proof and obtain conditions P1 and P2, we use symmetry (Lemma C.l). The reverse representation consists in changing u, into u2and vice versa. Observe that if 4,is the distribution of the opposite (u2-u,),we have 4,(h)= 1 -$(-h) and therefore q,(h)=q(-h). Apply

Lemma C.2 to that distribution to show that A, =fx 7(p(.r)d7 is finite. As ul and u2have the same distribution and E(ul-u2)is finite because A and A, are. it is necessarily equal to zero. Thus, we get property PI:

Second, consider equation (C.7)and apply it to the reverse representation to get property P2:

Ihv(h)l

(C.9) 3/3,,>0, lim

hi+ -% j-lx~(7)

d7

We can also summarize the properties of s(h)proven in this section and needed below:

APPENDIX D: PROOFSOF PROPOSITION2, THEOREM 3, AND COROLLARY

1

PROOFOF PROPOSITION

2: AS s(k)is strictly negative, equation (C.l) for h > 0 is equivalent to

and therefore, using equation (C.11),

When h tends to zero, we can expand the left-hand side to the third order as F(.)is continuously differentiable three times and f"is bounded (Theorem 2):

Therefore, when h +0 the left-hand side is equivalent to

Using first-order expansions, the right-hand side is equivalent when h +0 to

As f (x) > 0, equation (D.l) therefore implies property P3. There exists a0 > 0 such that

f" (.u ) 6cp(0)

Vx, -/(x) < (ff"i25 .&+?3 T(P(T)d7'

because cp(0) > 0 and the denominator is bounded by condition PI. Other conditions on a. are derived in the proof of Theorem 3.

In an appendix available upon request, it is proven that mixtures of zero mean normal variates verify this condition. Precision is the mixing parameter and is Gamma distributed of parameter 6 and A. Then

f"(x) 6+ 112

max -------(6 + 112

x>o f(x) A(26 +3)

and for any 6 > 0, we can choose A > 0 to satisfy property P3 for any a, > 0. In conclusion to this proof, and using equations (C.lO) and (C.l), the density function on the 45" line can be written by continuity as

PROOFOF THEOREM 3: To prove that equations (3.1) and (3.2) define a joint distribution function that verifies Assumption Rl(ii) and Condition S, we shall prove that the joint density function that these equations define, exists everywhere, is continuous, bounded, and positive. Using equation (C.l), it is easy to see that it is defined everywhere (including for xl = xl as proved in the previous subsection for an appropriately chosen a,) and is continuous and bounded since f and cp are.

To finish the proof, we shall prove that the joint density is positive. We consider the case h > 0 only and rely on symmetry (Lemma C.1) for h < 0 and on continuity for h =0. First the following lemma is proven below (in the next subsection).

LEMMA D.l: For any a > 0 such that Vx, fl'(x)/f (x) < a':

where sh(.) and ch(.) are the hyperbolic sine and cosinefunctior~s.

Second, if the following property holds:

then set parameter a. in condition P3 at that value, a0 = a. Apply Lemma D.l and replace ash(ah)/(ch(ah) -1) by its bound (D.3) in equation (D.2) to obtain

which proves by equation (C.1) that the joint density is positive. To prove Theorem 3 and find the value a0 =a,equation (D.3) shall thus be proven.

Observe first that the limit when h ix of

is equal to a and that

because ch(ah) -1 > (ah)'/2 for h > 0. Then,

Use condition P.2 to define Poand M such that

Set a 5 Poand equation (D.3) is then verified for any h M. Consider now h 5 M. Equation (D.3) can be rewritten as

In a lemma available upon request, we prove that the expression between brackets on the right- hand side is positive and less than or equal to 116. Set PI to

-min

As ~(h)

is positive and continuous and as the minimum is taken over a compact set, pl > 0.Note that it is quite stronger than the bound derived in the proof of Proposition 2. Choose a 5 (6P1)'12 and equation (D.3) is satisfied for h 5 M. In conclusion, provided that a 5 min(Po, (6P1)'l2), equation (D.3) is satisfied for any h > 0.Using the reverse representation (Lemma C.1),we can prove that it is satisfied for any h. It also proves that if equation (D.3) is verified for a, then it is verified for any a' < a. Q.E.D.

PROOFOF LEMMAD.l: For any A 6 [0, I] let

m(A) =f (x +Ah) > 0

and observe that m(0) =f (x) and m(1) =f (x +h). The condition f (x)/f (x) < a' implies that

Define also function g(A) such that

f (x +h)sh(ahA) +f (n)sh(ah(1 -A))g(A) = sh(ah)

Observe that

g(O) =f (x), g(l) =f (x +h),

As the degree of convexity of g(.)is "larger" than the degree of convexity of m(.),we now show that

m(A)> g(A) for any A c]O,1[.

Let P(A)=m(A) -g(A). Observe that T(0)=0, T(1)=0,and that T(A)is twice differentiable. Thus

Because m(A)> 0 and because of the inequalities above,

(A) 0 ===+ T8'(A)< 0.

Assume, by contradiction, 3Ao, W(Ao)< 0. As T(.)is continuous, 3 (Al,A2) such that Al <

A. < A2 and such that T(A1)= T(A2)=0. Then V A €]Al, A2[, T(A)< 0,and T1(A)i0. It is a contradiction since it is not possible to construct a twice differentiable concave function in an interval where it takes value 0 at the end points and is negative in between. Thus, T(A)2 0. By contradiction assume that 3A. €10, l[; T(Ao)=0.It is impossible since T would be concave at that point. Therefore T(A)> 0 for any interior point. Returning to the main argument, we therefore have

F(x+h)-F(r)=l xfh f(u)du= I f(x+hA)hdA

'

h(If (X + h)sh(ahA)+f (x)sh(ah(l-A)) = sh(ah)

using the definition of g(.).Thus, by symmetry

and the proof finishes by integrating the right-hand side. Q.E.D.

PROOFOF COROLLARY1: ul and u2are assumed to be independent. Then, equation (B.2) implies that

For any x, 0 < F(x)< 1by Assumption Rl(iii). Denote

Equation (D.4) implies that

Integrating this equation and imposing F(0)= i, we get the expression for F(x).Using equa- tion (3.1) we get the expression for c(h). Q.E.D. REFERENCES

ANDERSEN,E. B. (1973): Conditional Inference and Models for Measuring. Copenhagen: Mental- hygiejnisk Forlag.

ARELLANO,M., AND B. E. HONORE (2001): "Panel Data Models: Some Recent Developments," in Handbook of Econometrics, Vol. 5, ed. by E. Leamer and J. J. Heckman. Amsterdam: North- Holland, 3229-3296.

BARNDORFF-NIELSEN,

0.E. (1978): Information and Exponential Families in Sratistical Theory.

Chichester: Wiley. CHAMBERLAIN,

G. (1984): "Panel Data," in Handbook ofEconometrics, Vol. 2, ed. by Z. Griliches and M. Intriligator. Amsterdam: North-Holland, 1248-1313. -(1992): "Binary Response Models for Panel Data: Identification and Information," Har- vard University, Unpublished Manuscript. HONORE, B. E., AND E. KYRIAZIDOU (2000): "Panel Data Discrete Choice Models with Lagged Dependent Variables," Econometnca, 68,839-874. HONORE, B. E., AND A. LEWBEL (2002): "Semiparametric Binary Choice Panel Data Models

without Strict Exogeneity," Econometnca, 70,2053-2063. HouowrTz, J. (1998): Semiparametric Methods in Econometrics. Berlin: Springer-Verlag. LANCASTER,T. (2000): "The Incidental Parameter Problem since 1948," Journal of Econometrics,

95,391-413. LEWBEL, A. (2000): "Semiparametric Qualitative Response Model Estimation with Unknown Heteroskedasticity or Instrumental Variables," Journal of Econometrics, 97, 145-177. MAGNAC,T (2002): "Panel Binary Variables and Individual Effects: Generalizing Conditional Logit." WP CREST 2002-18, www.crest.fr/doctravail/document/2002-18.pdf. MANSKI,C. F. (1987): "Semiparametric Analysis of Random Effects Linear Models from Binary Panel Data," Econometnca, 55,357-362. -(1988): "Identification of Binary Response Models," Journal of the American Statistical Association, 83,729-738. MATZKIN,R. (1992): "Nonparametric and Distribution-Free Estimation of the Binary Threshold Crossing and The Binary Choice Models," Econometnca, 60,239-270. RASCH, G. (1960): Probabilistic Models for Some Intelligence and Attainment Tests. Copenhagen: Denmark Paedagogiske Institut.

Comments
  • Recommend Us