Panel Binary Variables and Sufficiency: Generalizing Conditional Logit
by
Thierry Magnac
Citation
Title:
Panel Binary Variables and Sufficiency: Generalizing Conditional Logit
Author:
Thierry Magnac
Year:
2004
Publication:
Econometrica
Volume:
72
Issue:
6
Start Page:
1859
End Page:
1876
Publisher:
Language:
English
URL:
Select license:
Select License
DOI:
PMID:
ISSN:
Updated: November 26th, 2012
Abstract:
PANEL BINARY VARIABLES AND SUFFICIENCY:
GENERALIZING CONDITIONAL LOGIT
This paper extends the conditional logit approach (Rasch, Andersen, Chamberlain) used in panel data models of binary variables with correlated hed effects and strictly exogenous regressors. In a twoperiod twostate model, necessary and sufficient condi tions on the joint distribution function of the individualandperiod specific shocks are given such that the sum of individual binary variables across time is a sufficient statistic for the individual effect. By extending a result of Chamberlain, it is shown that rootn consistent regular estimators can be constructed in panel binary models if and only if the property of sufficiency holds. In applied work, the estimation method amounts to quasidifferencing the binary variables as if they were continuous variables and trans forming a panel data model into a crosssection model. Semiparametric approaches can then be readily applied.
KEYWORDS: Binary models, panel data, conditional logit, sufficiency, quasi differencing, semiparametric methods.
1. INTRODUCTION
THE ELEMENTARY BRICK of panel binary models with correlated fixed effects (see Arellano and Honore (2001) for a recent survey) is a twoperiod twostate model. The pair of individual binary variables is described by a pair of latent variables that are as sumed to be the sum of a linear index of explanatory variables, of the individual effect, and of the individualandperiod "specific shocks." The parameter P of the index is the parameter of interest. Estimating it by conditional logit is a wellknown semiparamet ric technique since it avoids specifying the distribution of individual effects conditional on covariates (Rasch (1960), Andersen (1973), Chamberlain (1984)). Its properties stem from the existence of a sufficient statistic for the individual effect which is the individual sum of the binary variables. By definition of sufficiency (or more precisely Ssufficiency; BarndorffNielsen (1978)), the conditional likelihood function depends on the parameter of interest only while the marginal likelihood function depends on the parameter of interest and the nuisance parameter, the individual effect (Lancaster (2000)). Conditional logit is nevertheless seen to be restrictive because of the distribu tional assumptions. Yet, an intriguing and important result related to conditional logit was shown by Chamberlain (1992). If individualandperiod specific shocks are inde pendent over time and if covariates are unbounded, consistent estimation at a firate of the parameter of interest is possible if and only if the distribution of individualand period specific shocks is logistic. The semiparametric efficiency bound is equal to zero in all other cases.
In this paper, I show that the sum of the binary variables is a sufficient statistic for the individual effect, under necessary and sufficient conditions that are much less re strictive than in the conditional logit approach. Moreover, the result of Chamberlain
'I thank Andrew Chesher, Miguel Delgado, JeanPierre Florens, two anonymous referees, the coeditor, and participants at seminars at CREST, Toulouse, Tinbergen, UC London, Bonn, CIRANO, Brown, and Quebec, and conferences EC2 in Dublin, ESEM in Lausanne, Panel Data in Berlin, and CAM in Copenhagen, for helpful comments. The usual disclaimer applies.
(1992) is generalized. If covariates are unbounded, then consistent estimation at a &rate is possible if and only if the sum of the binary variables is a sufficient statistic. The property of sufficiency thus characterizes all models (when variables are strictly exogenous) for which it is possible to construct &consistent estimators. The strict exogeneity assumption can be partially relaxed. Conditional logit is a special case. The only joint distribution function such that (i) the individualandperiod specific shocks are independent, and (ii) the sum of the binary variables is a sufficient statistic, is the logistic distribution.
Furthermore, it is shown that imposing the property of sufficiency reduces the di mensionality of the model from the unknown bivariate density function of individual andperiod specific shocks into two univariate density functions satisfying additional tail conditions. The property of sufficiency implies that shocks are exchangeable and this setting is nested into the setting of Manski (1987). The reduction of dimensionality explains why ficonsistency holds. Furthermore, maximizing conditional likelihood is shown to be equivalent to using an estimating equation that relates the expectation of the difference between the pair of binary variables and the difference in the linear index of covariates, in the sample where binary variables differ across time. It thus amounts to quasidifferencing the binary variables and turning the panel estimation problem into a crosssection one. Semiparametric techniques can then be readily applied.
Section 2 presents the setup (Assumptions R), defines the property of sufficiency (Condition S), and proves the extension of Chamberlain (1992) about &consistent estimation. Section 3 is the main theoretical section where joint distribution functions that satisfy sufficiency are characterized as functions of two univariate distribution functions. Necessary and sufficient conditions on these functions (Properties P) are given. We study estimation and give a parametric illustration in Section 4. Section 5 proposes extensions and concludes. All proofs are in appendices.
2. REGULARITY CONDITIONS, SUFFICIENCY, AND ROOTN CONSISTENCY
Consider y,, yz, two binary variables, and zl, z2, two Ldimensional covariates2 The model is:
Fort=1,2: y,=l ifandonlyif z,P+E+u,>O.
We adopt some regularity assumptions:
ASSUMPTIONR1: (i) The dlfSerence across time of the first covariate (say), zj ' zil', continuously varies over the whole real line R (for almost all values of other covariates) and the coejjicient of the first covariate, p'", is equal to one.
(ii) The support of zl z2is not contained in any proper linear subspace of WL.
(iii) Random shocks (ul ,u2) have a strictly positive, continuous, and bounded density function with respect to the Lebesgue measure and are independent of zl,zl, and E.
(iv) The individual effect E continuously varies over the whole real line R (for almost all values of covariates).
'We only consider random samples and we do not subscript individual observations by i.
Assumptions Rl(i) and Rl(ii) are sufficient identification restrictions borrowed from Manski (1987, Assumption 2, p. 358). In contrast to Manski (1987), random shocks are supposed to be strictly exogenous and independent of all variables by Assump tion Rl(iii), though we discuss how to weaken this assumption in the last section. Assumption Rl(iii) also limits the discussion to the class of sufficiently smooth dis tribution functions in order to use the usual tools of differential calculus. Assump tion Rl(iv) assumes complete variation of all probabilities in the "betweenindividual" dimension. It is similar to Assumption Rl(i), which assumes complete variation in the "withinindividual" dimension.
For conditional logit, specific shocks u, and u2 are furthermore assumed to be in dependent and logistically distributed. It implies that the sum of binary variables is Ssufficientor sufficient from now onfor the incidental parameter, E (BarndorffNielsen (1978)); that is, for K =0, 1, and 2:
While it trivially holds when K = 0 or 2, this expression yields the conditional likeli hood of an observation such that (y,, y2) E ((0, I), (1,O)) when K = 1. The conditional likelihood function does not depend on the incidental parameter and the maximum conditional likelihood estimator is &consistent.
This approach has been criticized on the ground that the independence and lo gistic assumptions are overly strong. Observe that these assumptions seem indeed to be overly strong to derive &consistency since sufficiency alone implies that the conditional likelihood function is independent of the incidental parameter. Our first motivation is thus to prove that using sufficiency substantially generalizes conditional logit. Rewrite property (2.1) as Condition S(ufficiency), prominent in the rest of the paper:
LEMMA1: UnderAssumption R1, the sum of binary variables is sufJicient for the indi vidual effect, E, ifand only ifthere exists a real@nction c(.) such that:
Pr(u1 > Xl, U2 C ~2)=C(XI *.*).
(Condition S) V (xl,x2)E El2, Pr(u1 iXl, U2 > ~2)
Another justification for using and characterizing sufficiency is that it is the weakest condition that one can find in the present setup to construct &consistent estimators. In an unpublished paper, Chamberlain (1992) proved that if ul and u2 are independent, the semiparametric efficiency bound of parameter P is equal to zero unless the distri bution function of random shocks is logistic. It therefore tells us that &consistent estimators can only be constructed under the logistic assumption. By dropping the in dependence assumption, Chamberlain's result3 is extended into the following theorem:
3This theorem also leads to proof of the conjecture that the sum of binary variables is the only candidate for a sufficient statistic. Suppose there exists a statistic to which the principle of sufficiency, with respect to individual effects, could be applied. Then, using conditional likelihood methods, it would be possible to construct a &consistent estimator by conditioning on this statistic. By this theorem, this sufficient statistic is the sum of binary variables across time.
THEOREM1: UnderAssumption R1, the semiparametric efficiency bound for ,B is equal to zero unless the sum of binary variables is sufficient for the individual effect.
3. CHARACTERIZATION OF SUFFICIENCY
Condition S is easily used in estimation (see Section 4). It is however by no means obvious that function c(.)is unconstrained and can be set as wanted. It is the purpose of this section to derive these conditions. We first derive the general implications of the property of sufficiency on the joint distribution function of (u,,~1~).
We then prove that function c(.) is in a onetoone relationship with the distribution function of the difference ul u2. Necessary and sufficient conditions for sufficiency are then pro vided. We conclude by investigating the consequence of the additional assumption of independence between u and u2.
By including two timedummies among the explanatory variables, we can always adopt the following normalizations:
ASSUMPTION where F( .) is the marginal distributionfclnction of u, .
R2: (i) F(0) =
(ii) c(0)= 1.
First, we derive some necessary conditions for Condition S to hold and characterize the expression of the joint distribution function of (u,, u2).
THEOREM 2: Assume thatAssumptions R1, R2, and Condition S hold. Then:

 The function c(h) is strictly decreasing from +cx; to 0 and is twice continuously dif ferentiable.
 The marginal d.f. of u2 is equal to the marginal d.f. of ul:
 The joint d.f. of (u, , u2) is given by
 The function c(h) is strictly decreasing from +cx; to 0 and is twice continuously dif ferentiable.
 F(.) is threetimes continuously differentiable and f" is bounded where f is the den sity functioiz of 14,.
Claim 1 is directly derived from the limit conditions in Condition S. In claim 2, the identity of marginal distributions of ul and u2 is reminiscent of the property of exchangeability at the heart of the score method developed by Manski (1987). This property is here shown to be the consequence of the sufficiency property, which is therefore stronger than exchangeability. Claim 3 of Theorem 2 gives a characteriza tion of the joint probability function in terms of two functions F(.) and c(.)only. The sufficiency property thus reduces the dimensionality of the problem, at least, from a
In Chamberlain (1992) there is another result about identification when regressors are bounded. It uses a very similar technique of proof. It is a conjecture that a generalization of that result also holds.
function of two real arguments to two functions of one real argument. We show in the Appendix how to define by continuity the joint probability distribution when x2 =XI. It is easier to continue to work with the distribution function of the difference, ul uz,which is in a onetoone mapping with function c(.).
PROPOSITION1: Let 4(h) (resp. cp(h)) be the distribution (resp. density) function of ul u2. Under Assumptions R1R2 and Condition S, we necessarily have two con ditions:
r+x rx
lhcp(h)l
(P2) 3Po > 0, lim > Po. /l++lx rcp(r) d~
Function c(.) and distribution +(.) are in the following onetoone relationship:
The first condition tells us that the expectation of (ul u2)when ul u22 0 is finite and makes use of the fact that ul and u2are identically distributed. Thus, E(ul 1d2) =0 even when Eul does not exist. These two regularity conditions P1 and P2 are verified if the distribution of ul uz is thin tailed and not too hectic at infinity, for instance, if 4(.)is the normal d.f. They are not if the distribution is Cauchy for instance.
In contrast, the marginal distribution F(.) should have thick tails:
PROPOSITION2: UnderAssumptions R1R2 and Condition S, we necessarily have:
f"(x)
(P3) 3 a" > 0 such that Vx, < (aol2,
f (x)
where a. depends on distribution function 4(.).The set of such distributions is not empty.
The marginal density function f should not be "too" convex and the normal dis tribution, say, would not qualify for this condition. In contrast, mixtures of normal distributions through a gammadistributed precision parameter verify this condition as presented in the next section.
We can now prove that the sufficiency property can be used for a much broader set of joint distributions than the logistic.
THEOREM3: Assume Assumptions Rl(i), Rl(ii), and Rl(iv). Let D be the set ofpairs (cp, f) of strictlypositive, continuous, and bounded densityfunctions on R, such that f is twice continuously differentiable and f" is bounded, and such that Assumption R2 and conditions PI, P2, and P3 hold. Parameter a. in condition P3 is defined in theproof.
Then, for any (cp, f) E D, (3.1) and (3.2) define a joint distribution function that verifies Condition S and Assumption Rl(iii).
As said, the sufficiency property can also be interpreted as reducing the dimension ality from a set of bivariate density functions to a set of pairs of univariate density functions (cp, f). This reduction explains why the ficonsistency result can hold. The theorem gives additional restrictions PI, P2, and P3 though these conditions do not affect dimensionality. An open question is how restrictive they are for empirical work on top of the dimensionality reduction.
Observe also that the reduction of dimensionality achieved by assuming indepen dence between u, and u2 is of the same magnitude since the joint density function is then given by two univariate marginal density functions. Using both dimensionality re ductions at the same time is very restrictive, however, since it yields only one parametric family, the logistic distribution.
COROLLARY are
1: Assume Assumptions R1R2, Condition S,AND that u, and LI? independent. Then, ul and u2 are logistically distributed. Formally, there exists k > 0 such that:
4. SEMIPARAMETRIC ESTIMATION AND A PARAMETRIC EXAMPLE
Under Condition S, the conditional probability that can be used as the estimating equation, can be written as (see equation (A.l) in Appendix A):
Observe first that the model is a crosssection binary model. The "transformed" dependent variable can only take two values, "Entry" (yl =0, y2 = 1, Ay = +1) or "Exit" (yl = 1, yz =0, Ay = I), in the sample of movers (c:=,=
y, 1). As the index (zl z2)Pis linear in P, it is in this sense that the sufficiency property permits "quasi differencing" the data. Semiparametric identification is achieved as in Manski (1988) or Horowitz (1998) using Assumption Rl(i),(ii). Provided that Assumption R2(ii) and conditions P1 and P2 are verified,%quation (4.1) describes a monotonic singleindex model, for which estimation methods are described, for instance, in Horowitz (1998). Besides, we can extend this estimation principle to T periods with T > 2, by borrow ing the idea of Manski (1987). Firstdifference binary variables between two periods in sequence. Then write a pseudolikelihood function as the sum of likelihood functions given by equation (4.1) for the (T 1) pairs.
Parametric methods are less attractive than semiparametric methods since assuming a parametric distribution different from the logistic implicitly assumes away indepen dence. It is nevertheless interesting to investigate special cases to understand the con sequences of imposing sufficiency on distribution functions. The simplest parametric example is the known case of logistic distributions when function c(h)=exp(h) and
4Normalization of Assumption R2(ii) translates into G(0) = 1/2, condition PI into G' is positive and bounded, and condition P2 into conditions on the tails of G(.).These conditions are derived using equation (3.2).
the distribution function given by (4.1) is the logistic distribution. There are two routes to depart from this assumption. The first route is to use popular distributions in (4.1), for instance the normal d.f. It however seems to generate quite implausible distribution functions for the difference between ul and u2.' The second route is to specify the distribution function of the difference between ul and u2. We now briefly look at that case, when ul u2is normally distributed with zeromean and variance equal to uo, say. The density function, p,(.), is symmetric around 0 and h:" rpO(r)dr = uOpO(h) is finite when h =0 (condition PI). Condition P2 is satisfied since
Looking for compatible marginals, the convexity condition P3 shall be satisfied for any marginals such that f"/f < (ao)2= 6/uo,as can be proven from the proof of Theorem 3. Consider a mixture of zeromean normal variates where the precision is gammadistributed with parameter 6 and A. Then, the density function is6
and
fl'(x)6 + 112
max (6+ 1)'.
f(x) A(26 +3)
Choose A and 6 such that this maximum is less than (ao)2and consider the following model (where i now is an individual index):
where (ti,5;)are two independent zeromean unitvariance normal variates and 1/u: is gammadistributed with parameter 6 and A. Then, condition P3 is verified because the convolution of a distribution that verifies P3 (i.e. ulti)with any distribution verifies P3.' Besides, ul, and uzi are identically distributed because 5: is symmetrically distributed. In empirical applications, model (4.2) can be used when there is "a lot of heterogeneity" in the levels of the shocks and less in the difference.
'This and other examples are studied in the working paper, Magnac (2002), where bounds on correlation coefficients between ul and u, are also derived. In these examples, bounds are not limiting.
6Theseresults are shown in an Appendix available upon request or on my web page.
'Observe that the original model is not unique since any random variable can be added to (u,+ u2)/2and substracted from the individual effect E. All results are invariant to these renormalizations.
5. DISCUSSION AND EXTENSIONS
In this paper. we used the principle of sufficiency and conditional inference to derive a generalization of conditional logit. We presented the conditions under which we can quasidifference binary data as if they were continuous. Crosssection semiparametric procedures can be used to estimate these models with unrestricted individual effects and their results can be compared with those that are obtained using random effect specifications. By extending a result by Chamberlain, we also showed that it is under the property of sufficiency only that we can construct &consistent estimators in the panel binary choice model when regressors are exogenous.
There are some straightforward extensions. The first extension is that the linear in dex property, writing latent variables as z,P, is far from necessary. Deterministic parts of latent variables in each period could be written as f;(z,, P,) and the conditional model would become a function of the difference between these nonlinear indices pro vided that the latent models remain additive in the individual effect. Functions f,could even be partially unknown if the conditions of Matzkin (1992) are fulfilled.
A more involved extension is to permit the distribution function of specific shocks (u!,u.) to depend on (zl, z?). As the present model is nested into the flexible setting of Manski (1987). we could get closer to it. It does not seem to be possible however to let the joint distribution function depend on covariates in a completely unspeci fied way. The reason is that in our proofs. we have to use the continuous variation with respect to the individual effect, E, and with respect to the difference, .xi x2 (see Assumption R1). The closest we can easily get is to make the joint distribution of (u,,ii?) depend on all covariates except the differencc between the first continuous covariatc (z;" zi"). We can then repeat the present analysis, conditional on values of zo = (z,1) + z!", zil', iil') where 2;". zi" include all covariates except the first. All assumptions are written conditional on w and all results apply, conditional on w. It might be reminiscent of the assumption used by Lewbel (2000) and Honore and Lewbel (2002). Covariate (z/" zi") is the special regressor and equation (4.1) is not singleindexed any longer.
When data are observed over a longer time period (T > 2). periods can be chained twobytwo to construct pairs as already said. On the one hand, the question of which pseudolikelihood for the (T1)pairs to consider is worth investigating and this ques tion is left for future research. On the other hand, using the property of sufficiency for triples of binary variables. foursome, or more. instead of pairs as we did here, is not an interesting extension. Some tedious investigation revealed that the only possible dis tribution function that verifies the property of sufficiency in the case of triples, is the logistic d.f. The sketch of the proof is the following. For triples, the relative probabil ities depicted by the analog of Condition S of exchangeable choices between any two latent elements should not depend on the level of the third latent element of the triple because this variable contains the individual effect. The Independence of Irrelevant Alternatives property comes in and drives us back to the logistic distribution.
Other lines of research seem more challenging. It remains to be seen how such an approach would be applied to other nonlinear models. It might be easier to extend this approach in models where we know that the principle of sufficiency can be applied (Weibull. Poisson,. ..). It seems to be a lot more difficult in dynamic models (Honor6 and Kyriazidou (2000)) and even more difficult in other models such as Tobitlike models.
INRA,ParisJourdan and CRESTINSEE,48, Bld Jourdan, 75014 Pans, France; tmagnac@ens.fr; www.inra.fuiESR/UR/lea/magnac.htm.
Manuscript received June, 2002;final recision received October; 2003
APPENDICES
Equations or lemmas starting with A to D refer to the section of the Appendix where they are stated. Conditions starting with R, S. and P and numerals refer to the text.
APPENDIX A: PROOFS IN SECTION2
PROOFOF LEMMA1: Denote x, = z,P E and use Assumption Rl(iii) to write
By definition, sufficiency is satisfied if and only if, for (y,,j2)E ((0,I), (1.0)). equation (2.1) is verified. If it is verified for one of these pairs, say (1, O), it is satisfied for the other pair since probabilities sum to one. Thus, sufficiency is equivalent to the property that
is independent of c,because condition (2.1) says that
is independent of s.
By Assumption Rl(iii), r(xl,x2)is a smooth function from 8" to Rand x, and xz vary over the whole real line by Assumptions Rl(i) and Rl(iir). Thus, r(xl,I?)is independent of s if and only if it depends on the only combination of (.xi.x2) that does not depend on s, that is the difference x, x2. Thus, there exists a real function c(.)such that r(xl,x:) = c(xl n2). Reciprocally, if such an expression holds, sufficiency holds and
PROOFOF THEOREM 1: We adapt the proof of Chamberlain (1992, Theorem 2, p. 7). Define first the vector of probabilities:
LEMMA A.l: The semiparametric efficiency hound I.i= 0for all /3 in O unless tlze riistribution furzctiorz of random shocks is suclt that
(A.2) Vzlr z2 3 4 = d~~.51r4) E: X4 such that
14~.
Vs EE, $r1a(z.s, p) =0.
PROOF: See Chamberlain (1992). marginally adapting the proof to the case where u, and u2 are not independent. Q.E. D.
Second, fix z1 and z!. By Assumption Rl(iv), E continuously varies over X.Observe that if E + +m we must have 4b4 =0 and that if E + oo we must have GI =0 (Chamberlain (1992)). Therefore equation (A.2) in the lemma above is equivalent to
which is equivalent to
where this ratio is independent of E. It is equation (2.1) and, by equivalence, Condition S stated in Lemma 1. Reciprocally, if Condition S holds, then the semiparametric efficiency bound I, # 0 since the conditional likelihood estimator is ficonsistent. Q.E.D.
APPENDIX B: PROOF OF THEOREM 2
Claim 1 is proven by using monotonicity properties of probability functions, limit conditions, and Assumption Rl(iii). We now prove claims 2 and 3. The proof proceeds by reparameterizing the problem as
A+h Ah
X, = ' x2=
2'
and by observing that by Assumptions Rl(i) and Rl(iv), pairs (A, h) span all @. Let
and
Using these expressions and Condition S,
K(h,3)= c(h)G(h, A):
we can write
By normalization, c(0)= 1, and thus
and Using equation (B.l):
which is equation (3.1) in the text. Observe that equation (3.1) tends to 010 when h =xl x2t 0. As the numerator and de nominator are continuously differentiable, we can do Taylor expansions around points xl =x2:
where f (xl) > 0 (Assumption Rl(iii)), c'(0)<0 (claim 1 of this theorem), and
lim 0(x2xl)/(xZxI)=0.
xlrz+O
By taking limits of the numerator and denominator of equation (3.1) divided by (xl x2),we have
As the joint density function is continuous and bounded, we can continuously differentiate this last equation two times and the result is continuous and bounded. Therefore, F(.)necessarily is threetimes continuously differentiable and f"is bounded (claim 4).
APPENDIX C: PROOF OF PROPOSITION 1
The proof of this proposition proceeds in various steps. We first exhibit a condition of sym metry, which substantially simplifies proofs below. We then derive the joint density function of (u,, u2) and the distribution of ul u2.We finally derive the expression of function c(.) and conditions P1 and P2 stated in the proposition.
A TECHNICALSIMPLIFICATION:EXPLOITINGTHE SYMMETRY OF THE PROBLEM: There is a fundamental symmetry in the problem with respect to the disturbances u, and u2.Symmetry is a direct consequence of Condition S. If we change ul into u2and u2into ul,we change c(h)into l/c(h).By Theorem 2,we also know that the marginal distributions of ul and u2are identical and equal to F(.).We can therefore limit the proofs below to the case, h >0,provided that we verify the conditions bearing on the straight representation c(h)and on the reverse representation l/c(h).This property is summarized quite informally by the following lemma.
LEMMA C.1: If Condition S holds and ifconditions for c(h) and l/c(h) hold for any h >0, they globally hold for c(h).
THE JOINT DENSITY FUNCTION: The joint density function is derived by noting that Assump tion Rl(iii) allows for differentiating two times equation (3.1). The second crossderivative, or density function, denoted g(x,, x2) is strictly positive by Assumption Rl(iii) and equal to
where
is a negative function since c(.) is decreasing. It has a singularity point at h =0.By continuity, we nevertheless can obtain g(xl, xl) (see Proof of Proposition 2).
THEDISTRIBUTIONOF u, zc:: Observe that symmetry (Lemma C.l) can be used. Inter

changing ul and u2 transforms the distribution of u, u, into the distribution of the opposite,
11.1 Ill. Consider then h > O and use equation (C.l) to write Pr(ul LL: > h)as
Setting r =xl .u2, we get
Observe that
because of equation (C.2), because lim,,,,,(l/(l c(11)))= 1 (Theorem 2), and because liml,,,, s(h)=O (see definition (C.2), namely c(.) is decreasing and tending to 0 when h tends to zero so that c' tends to 0).
Using also
we get
I
1 s(h) +h)F(x)] dx.
1 ~(h)m
Given that all functions in this expression are well defined,
so that differentiation with respect to h and integration can be permuted:
As the integral takes value 0 when h =0, we obtain
Replacing the integral in the expression of Pr(ul uz > h)yields
Denoting $(h)=Pr(ul u2 5 h),
using equation (C.2). Symmetry (Lemma '2.1) can be checked and this formula applies to h < 0 and by continuity to h =0.
CONDITIONS DISTRIBUTIONFUNCTION$(.) (Conditions P1 and PI): We
ON THE now seek c(.)as a function of $(.) and derive necessary conditions on &(.). First. integrate equa tion (C.3) to get
where A is a constant of integration to be found. As h/(l c)is equal to (l/c'(O))> 0 when h =0. A is equal to (l/cl(0))> 0.The following lemma determines A.
~jherr q(h) =@(h) is the density fi~nction of ul 111.
PROOF:Let h > 0and consider the joint density function given by equation (C.1): s(h)(f(x+h)+f (XI)+sl(h)(F(x+h)F(x))> 0, where s(h)is a negative function. Use equation (C.3) to write
and integrate by parts the integral in equation (C.4),
to get
As s(h)< 0,hh~q(7)
d~ is bounded by A. Thus
(c.6) lim hp(h)=0.
h++.m
Rewriting equation (C.l)using s(h)<0 and F(x+h)F(x)> 0for any h > 0,
Replacing s(h)by its expression (C.5)implies that fix+h)+f(x) < 2+ hdh)
Vh> 0, Vx,
F(x+h) F(x) h A &' Tq(r) d7' Taking limits when h +rn yields
Therefore
hdh)
(C.7) 3/30> 0. A A. > PO.
v(7)d7
Because of equation (C.6),the numerator tends to zero and the limit of the denominator is thus necessarily equal to zero. Q.E.D.
Replacing A by its expression in equation (C.4) and solving for c, yields
To finish the proof and obtain conditions P1 and P2, we use symmetry (Lemma C.l). The reverse representation consists in changing u, into u2and vice versa. Observe that if 4,is the distribution of the opposite (u2u,),we have 4,(h)= 1 $(h) and therefore q,(h)=q(h). Apply
Lemma C.2 to that distribution to show that A, =fx 7(p(.r)d7 is finite. As ul and u2have the same distribution and E(ulu2)is finite because A and A, are. it is necessarily equal to zero. Thus, we get property PI:
Second, consider equation (C.7)and apply it to the reverse representation to get property P2:
Ihv(h)l
(C.9) 3/3,,>0, lim
hi+ % jlx~(7)
d7
We can also summarize the properties of s(h)proven in this section and needed below:
APPENDIX D: PROOFSOF PROPOSITION2, THEOREM 3, AND COROLLARY
1
PROOFOF PROPOSITION
2: AS s(k)is strictly negative, equation (C.l) for h > 0 is equivalent to
and therefore, using equation (C.11),
When h tends to zero, we can expand the lefthand side to the third order as F(.)is continuously differentiable three times and f"is bounded (Theorem 2):
Therefore, when h +0 the lefthand side is equivalent to
Using firstorder expansions, the righthand side is equivalent when h +0 to
As f (x) > 0, equation (D.l) therefore implies property P3. There exists a0 > 0 such that
f" (.u ) 6cp(0)
Vx, /(x) < (ff"i25 .&+?3 T(P(T)d7'
because cp(0) > 0 and the denominator is bounded by condition PI. Other conditions on a. are derived in the proof of Theorem 3.
In an appendix available upon request, it is proven that mixtures of zero mean normal variates verify this condition. Precision is the mixing parameter and is Gamma distributed of parameter 6 and A. Then
f"(x) 6+ 112
max (6 + 112
x>o f(x) A(26 +3)
and for any 6 > 0, we can choose A > 0 to satisfy property P3 for any a, > 0. In conclusion to this proof, and using equations (C.lO) and (C.l), the density function on the 45" line can be written by continuity as
PROOFOF THEOREM 3: To prove that equations (3.1) and (3.2) define a joint distribution function that verifies Assumption Rl(ii) and Condition S, we shall prove that the joint density function that these equations define, exists everywhere, is continuous, bounded, and positive. Using equation (C.l), it is easy to see that it is defined everywhere (including for xl = xl as proved in the previous subsection for an appropriately chosen a,) and is continuous and bounded since f and cp are.
To finish the proof, we shall prove that the joint density is positive. We consider the case h > 0 only and rely on symmetry (Lemma C.1) for h < 0 and on continuity for h =0. First the following lemma is proven below (in the next subsection).
LEMMA D.l: For any a > 0 such that Vx, fl'(x)/f (x) < a':
where sh(.) and ch(.) are the hyperbolic sine and cosinefunctior~s.
Second, if the following property holds:
then set parameter a. in condition P3 at that value, a0 = a. Apply Lemma D.l and replace ash(ah)/(ch(ah) 1) by its bound (D.3) in equation (D.2) to obtain
which proves by equation (C.1) that the joint density is positive. To prove Theorem 3 and find the value a0 =a,equation (D.3) shall thus be proven.
Observe first that the limit when h ix of
is equal to a and that
because ch(ah) 1 > (ah)'/2 for h > 0. Then,
Use condition P.2 to define Poand M such that
Set a 5 Poand equation (D.3) is then verified for any h M. Consider now h 5 M. Equation (D.3) can be rewritten as
In a lemma available upon request, we prove that the expression between brackets on the right hand side is positive and less than or equal to 116. Set PI to
min
As ~(h)
is positive and continuous and as the minimum is taken over a compact set, pl > 0.Note that it is quite stronger than the bound derived in the proof of Proposition 2. Choose a 5 (6P1)'12 and equation (D.3) is satisfied for h 5 M. In conclusion, provided that a 5 min(Po, (6P1)'l2), equation (D.3) is satisfied for any h > 0.Using the reverse representation (Lemma C.1),we can prove that it is satisfied for any h. It also proves that if equation (D.3) is verified for a, then it is verified for any a' < a. Q.E.D.
PROOFOF LEMMAD.l: For any A 6 [0, I] let
m(A) =f (x +Ah) > 0
and observe that m(0) =f (x) and m(1) =f (x +h). The condition f (x)/f (x) < a' implies that
Define also function g(A) such that
f (x +h)sh(ahA) +f (n)sh(ah(1 A))g(A) = sh(ah)
Observe that
g(O) =f (x), g(l) =f (x +h),
As the degree of convexity of g(.)is "larger" than the degree of convexity of m(.),we now show that
m(A)> g(A) for any A c]O,1[.
Let P(A)=m(A) g(A). Observe that T(0)=0, T(1)=0,and that T(A)is twice differentiable. Thus
Because m(A)> 0 and because of the inequalities above,
(A) 0 ===+ T8'(A)< 0.
Assume, by contradiction, 3Ao, W(Ao)< 0. As T(.)is continuous, 3 (Al,A2) such that Al <
A. < A2 and such that T(A1)= T(A2)=0. Then V A €]Al, A2[, T(A)< 0,and T1(A)i0. It is a contradiction since it is not possible to construct a twice differentiable concave function in an interval where it takes value 0 at the end points and is negative in between. Thus, T(A)2 0. By contradiction assume that 3A. €10, l[; T(Ao)=0.It is impossible since T would be concave at that point. Therefore T(A)> 0 for any interior point. Returning to the main argument, we therefore have
F(x+h)F(r)=l xfh f(u)du= I f(x+hA)hdA
'
h(If (X + h)sh(ahA)+f (x)sh(ah(lA)) = sh(ah)
using the definition of g(.).Thus, by symmetry
and the proof finishes by integrating the righthand side. Q.E.D.
PROOFOF COROLLARY1: ul and u2are assumed to be independent. Then, equation (B.2) implies that
For any x, 0 < F(x)< 1by Assumption Rl(iii). Denote
Equation (D.4) implies that
Integrating this equation and imposing F(0)= i, we get the expression for F(x).Using equa tion (3.1) we get the expression for c(h). Q.E.D. REFERENCES
ANDERSEN,E. B. (1973): Conditional Inference and Models for Measuring. Copenhagen: Mental hygiejnisk Forlag.
ARELLANO,M., AND B. E. HONORE (2001): "Panel Data Models: Some Recent Developments," in Handbook of Econometrics, Vol. 5, ed. by E. Leamer and J. J. Heckman. Amsterdam: North Holland, 32293296.
BARNDORFFNIELSEN,
0.E. (1978): Information and Exponential Families in Sratistical Theory.
Chichester: Wiley. CHAMBERLAIN,
G. (1984): "Panel Data," in Handbook ofEconometrics, Vol. 2, ed. by Z. Griliches and M. Intriligator. Amsterdam: NorthHolland, 12481313. (1992): "Binary Response Models for Panel Data: Identification and Information," Har vard University, Unpublished Manuscript. HONORE, B. E., AND E. KYRIAZIDOU (2000): "Panel Data Discrete Choice Models with Lagged Dependent Variables," Econometnca, 68,839874. HONORE, B. E., AND A. LEWBEL (2002): "Semiparametric Binary Choice Panel Data Models
without Strict Exogeneity," Econometnca, 70,20532063. HouowrTz, J. (1998): Semiparametric Methods in Econometrics. Berlin: SpringerVerlag. LANCASTER,T. (2000): "The Incidental Parameter Problem since 1948," Journal of Econometrics,
95,391413. LEWBEL, A. (2000): "Semiparametric Qualitative Response Model Estimation with Unknown Heteroskedasticity or Instrumental Variables," Journal of Econometrics, 97, 145177. MAGNAC,T (2002): "Panel Binary Variables and Individual Effects: Generalizing Conditional Logit." WP CREST 200218, www.crest.fr/doctravail/document/200218.pdf. MANSKI,C. F. (1987): "Semiparametric Analysis of Random Effects Linear Models from Binary Panel Data," Econometnca, 55,357362. (1988): "Identification of Binary Response Models," Journal of the American Statistical Association, 83,729738. MATZKIN,R. (1992): "Nonparametric and DistributionFree Estimation of the Binary Threshold Crossing and The Binary Choice Models," Econometnca, 60,239270. RASCH, G. (1960): Probabilistic Models for Some Intelligence and Attainment Tests. Copenhagen: Denmark Paedagogiske Institut.
Comments