by John Quiggin, Steve Dowrick
International Comparisons of Living Standards and Tastes:

A Revealed-Preference Analysis

"League tables," or rankings of countries in terms of GDP per capita, have prolifer- ated in recent years. They have been cited as evidence of the relative success or decline of nations and also as predictors of future economic growth. The most authori- tative of these rankings come from the United Nations International Comparison Project (henceforth ICP) which uses detailed price data from some 60 countries to provide comparisons based on purchasing- power parity. The ICP data have been ex- trapolated both across countries and across time by Robert Summers and Alan Heston (1984, 1988, 1991) to form the Penn World Table, which provides the basis for a growth industry in the analysis of comparative eco- nomic performance.

Our main purpose here is to examine the extent to which the published ICP rankings can be relied upon to indicate whether the "average citizen" is actually better off in one country than in another. Our concern stems from the fact that relative prices vary across countries. For instance, the 1980 price of meat relative to medical care was 4.8 times higher in Japan than in the United States, and 5.4 times higher than in Argentina. Given wide price variation, any single price index measure such as the ICP's "average international prices" (see United Nations Commission on the European Communities, 1986) is a dubious basis for

*Centre for Economic Policy Research, Australian National University, Canberra ACT 0200, Australia. Communications for John Quiggin at (Internet) jxq302G We thank colleagues at ANU and an anonymous referee for helpful comments. Data used in this study and results of analyses men- tioned here but not reported in full for space reasons are available from the authors or may be download- ed by anonymous FTP from


welfare comparisons. Our criterion for judg- ment is the Samuelsonian revealed-prefer- ence approach. If A could have afforded B's consumption bundle (at A's prices) while B could not afford A's bundle (at B's prices) then we sav that A is better off than B. basing this judgment on an assumption of common tastes and optimizing behavior. If neither could have afforded the other's bun- dle, the revealed-preference ranking is am- biguous.

Using detailed ICP price data to rank 60 countries, we find that there are indeed many ambiguities. Typically, the ranking of any one country is indeterminate within a range of five or six places. Also, when we make allowance for variation in the quantity and price of leisure, 16 of the painvise rank- ing~are reversed. Our general conclusion is that rankings by constant price measures of GDP are reliable for the purpose of ranking one group of countries of similar level of development against other groups, but they are not reliable for the purpose of intra- group rankings.

A secondary purpose of the analysis pre- sented here is to test the hypothesis that there are no systematic differences in tastes between countries. This is im~ortant be- cause the revealed-preference analysis relies on the maintained hypothesis of common tastes. It is also of interest because it is widely assumed that international differences in consumption patterns reflect dif- ferences in tastes. A ~ossible outcome of the test presented heie, namely, that for some pair A,B each country's representa- tive consumer could afford the other's bun- dle, amounts to a rejection of the hypothesis of common tastes. Except at the highest level of aggregation across commodities, the maintained hypothesis is not rejected for any of the 1,770 country pairs in our data set. This confirms the results of Irving Kravis


et al. (1982 Ch. 9), who found no violations of the common-tastes hypothesis in an early version of the ICP data set, and of Marilyn Manser and Richard MacDonald (19881, who found that time-series data on U.S. consumption was consistent with a representative-consumer model.

I. The Revealed-Preference Approach

Our procedure is as follows. For each country i, we have a vector of relative prices P' (where the numeraire is the international average consumption bundle) and a vector of quantities Q' representing average per capita consumption of each group of goods. We assume that, for the representative indi- vidual, the budget set is given by

and that the consumption vector Q' is the most preferred element of this set. For any pair of countries A,B we test the following inequalities:


We take as a maintained hypothesis that representative individuals in countries A and B have the same preferences. If (i) holds, but (ii) does not, we can infer from the weak axiom of revealed preference (WARP) that the consumption bundle for the average resident of country A is preferred to that of the average resident of country B. That is, the average resident of country A could afford to buy the average consump- tion bundle for country B, but not vice versa. Similarly if (ii) holds but not (i), we may infer that the consumption bundle for the average resident of country B is pre- ferred. If neither inequality holds, an unam- biguous welfare ranking is not possible. Fi- nally, if both inequalities hold, the data are inconsistent with the hypothesis of common tastes.

The possibility of making a comparison depends on the similarity between the relative-price and relative-quantity structures in the two countries. If relative prices are the same in the two countries, the com- parison procedure will yield an unambigu- ous ranking. However, this need not coin- cide with the ranking obtained using U.S. or world relative prices. Thus, the league-table procedure may yield an unambiguously in- correct welfare ranking. The case of identi- cal relative-quantity structures is simpler. If two countries consume all goods in the same proportions, any index of output will yield the same ranking (indeed the same ratio of indexes) for the two countries.

As was first observed by Paul Samuelson (1947), the direct revealed-preference rela- tionship is not transitive. A transitive rela- tionship may be based either on the strong axiom of revealed preferences (SARP) or, more satisfactorily, on Hal Varian's (1982, 1983) generalized axiom of revealed prefer- ence (GARP). This requires that, if there exists a sequence of pairwise comparisons such that A' is directly revealed preferred to A,,,, i = 1,.. . ,n -1, it should not be true that A, is strictly directly revealed preferred to Ai. The GARP axiom is de- rived from the transitive closure of the di- rect revealed-preference relation on which the WARP is based.

The results of Varian (1982, 1983) and of Sydney Afriat (1967) and Erwin Diewert (1973) show that GARP is satisfied if and only if there exists a nonsatiated, continu- ous, concave, monotonic utility function that rationalizes the observed price and quantity data. It will be useful to extend the results of Afriat (1967) and Varian (1982, 1983) with the following lemma, which permits a straightforward test of GARP.

LEMMA 1: The following are equicalent:

The GARP is satisfied.
There exists a nonsa tia ted, continuous, concace, monotonic utility function U that rationalizes the obserced price and quantity data.

(iii) There exists an ordering of the price and quantity obsercations, denoted > , such that there exists no i and j with i > j and j strictly recealed preferred to i.


The equivalence of (i) and (ii) is proved by Varian (1983). If (ii) holds, the observa- tions can be ranked in order of U(Q), with ties broken arbitrarily. This ordering must satisfy the conditions of (iii). If (iii) holds, no cycle of the kind required for a violation of GARP can occur, and hence (i) must hold.

The revealed-preference approach used here rests on the assumption that the ob- served consumption vector is the most pre- ferred element of the budget set. Some comparisons can be made without invoking revealed-preference axioms. The data per- mit a comparison based only on the assump- tion of nonsatiation. If QiA 2 QLBVi,with at least one inequality strict, then nonsatia- tion implies that A is better off than B. We will say in this case that A strictly dominates


This approach may be contrasted with that used to construct a league table. Here the analysis is undertaken using a single vector of relative prices which may be de- noted PC.The main candidates have been

U.S. relative prices and weighted averages of prices for a number of countries, as used by the ICP. Any such price vector will give a clear set of painvise rankings, but if the "international" relative prices differ from those found in the two countries, the rank- ing may not have any interpretation in terms of preferences.

An alternative (and inferior) procedure has been to convert national GNP data into a common currency using exchange rates. Since the relative prices of traded and non- traded goods differ between countries, there is no common price vector that will rational- ize this procedure for comparing bundles QA and QB.

There are a number of difficulties involved in the revealed-preference procedure. Although it is important to qualify our results in the light of these difficulties, many of these problems apply equally to other approaches to international comparisons.

First, there are problems associated with the use of the quantity vector Q as the sole argument of the utility function. There are well-known problems in measuring quanti- ties and qualities of some services, including public services such as education. The ICP project has made much progress in deriving internationally comparable quantity indexes for these goods, but difficulties remain. No allowance is made for environmental dif- ferences, with the anomalous implication that Canadian expenditures on heating fuel, far outweighing, say, Israeli expenditures, are treated as an indicator of Canadian well-being. Robert Eisner (1988) observes similar problems in the treatment of expen- ditures on policing, prisons, and other such corrective or defensive goods. Furthermore, the national accounting data that are typi- cally used in these comparisons take no account of domestic production or of pro- duction and consumption externalities. Problems of this kind are not specific to the revealed-preference approach. They arise in the same form in any analysis based on expenditure data, and in particular with the league-table approach.

Second, there are difficulties associated with the concept of a representative individ- ual. There are two reasons why average consumption data might satisfy the restric- tions imposed by revealed-preference theory. Individual preferences may satisfy ap- propriate aggregation conditions (see e.g., Angus Deaton and John Muellbauer, 1980). Alternatively, governments may redistribute income to maximize a quasi-concave social- welfare function. Samuelson (1956, 1964) and Varian (1984) give conditions under which the resulting aggregate consumption data would look as if it were generated by a representative consumer.' The league-table approach does not explicitly involve the

'one apparent difficulty is that the budget set (Q: P1.QIP'.Q') is not available to society as a whole. This difficulty is irrelevant whichever of the justifica- tions for the use of a representative-consumer model is adopted. Irrelevance in the case where aggregation conditions are satisfied follows from the fact that each consumer takes prices as parametric. In the social-choice setting, provided preferences are sufficiently quasi-concave, the maximum over the convex set that is available to society coincides with the maximum over

(Q: Pr.Q5P'.Q1).

representative-individual concept. However, no normative significance can be attached to comparisons of per capita GDP in the absence of such a concept.

The third class of difficulties with the revealed-preference approach is associated with the assumption that Q' is the most preferred element of the budget set derived from P'. This assumption permits meaning- ful comparisons if representative consumers in both countries share the same preferences. The power of the method to provide a test of the maintained hypothesis of com- mon tastes may also be considered.

Partial-equilibrium reasoning suggests that a fairly powerful test is available. Sup- pose that two countries have the same in- come and technology, but different tastes, and that national supply curves slope up- ward. Suppose demand for motor vehicles, relative to restaurant meals is higher in country 1than in country 2. Then we would expect to see a higher consumption of mo- tor vehicles and a higher price in country 1, and a correspondingly lower consumption and price for restaurant meals. Hence, if the aggregate GNP is similar in the two countries, when measured at international prices, we would expect that both inequali- ties (i) and (ii) should hold. Thus, if we in fact observe no, or very few, cases where both inequalities hold, it seems reasonable to conclude that differences in tastes are not the major determinants of international differences in quantity structures, at least between countries with similar GNP levels.

A related difficulty arises if consumption is quantity-constrained. This problem is likely to prevail in relation to publicly sup- plied goods. Health and education are im- portant examples. Labor-leisure choices may also be quantity-constrained. Quantity constraints may give rise to spurious find- ings of differences in tastes, as is illustrated in Figure 1.Suppose A and B have identical tastes. In country A, good 1 is supplied at a low price but is rationed to a quantity less than average consumption in country B. Then country B's consumption may appear to lie inside A's budget set if the quantity constraint is not taken into account. How- ever, A's consumption actually lies inside



B's budget set. In this case, the representa- tive consumer in A is actually worse off, but the test procedure will yield a spurious re- jection of the hypothesis of common tastes. However, on the plausible hypothesis that rationing is associated with a low relative price for the good in question, quantity con- straints in only one country will not gener- ate spurious findings that one country is better off than another.

11. The Data: Sources, Aggregation, and Definition

The level of aggregation of consumption bundles will inevitably affect comparisons. The use of constant international prices to rank per capita GDP levels is equivalent to a maximum level of aggregation which will in general yield unambiguous, though per- haps spurious, rankings. At the other extreme of full disaggregation, with goods and services differentiated by location of consumption and production, we are likely to find that very few rankings are possible. We conjecture that rankings will become less precise as we increase the level of disaggre- gation. Our approach is to use the several levels of disaggregation that are available in our data set to see whether this conjecture holds.

Rankings will also be sensitive to the range of goods and services over which the consumption bundle is defined. Our primary interest is in national-accounting def- initions of GDP, since these are most com- monly used in cross-country comparisons.

Such definitions exclude home production and fail to identify international flows of property income. The inclusion of investment in the "consumption bundle" can be justified by interpreting it as a claim on future rather than present consumption.

A further consideration is the treatment of labor and leisure. Comparisons of per capita GDP may rank a country higher sim- ply because the average consumer works longer hours in one country while consumers in another country enjoy more leisure. We address this problem by repeat- ing the analysis using data on average hours of work and the price of labor. Our analysis is then, in effect, asking the following ques- tion: "could A afford to buy the goods and services of B if A worked B's hours, while receiving A's wage for those labor hours?"

Our primary sources are the reports on Phase IV of the ICP published in two parts as United Nations and Commission of the European Communities (1986, 1987). While Part One presents the aggregated "league tables" of international-price GDP for 60 countries in 1980, Part Two contains the details of prices and quantities disaggre- gated into private consumption, government consumption, and capital formation. These are broken down into ten broad categories, which are further disaggregated into 38 de- tailed categories.

We also use data on the quantity and price of leisure (labor). The quantity of la- bor per head of population is the product of the employment-to-population ratio, derived from Summers and Heston (1990, and the average number of hours worked per week per worker, from International Labor Office (1990). The wage rate is the product of GDP per capita, from Summers and Hes- ton (1991), and the ratio of compensation of employees to GDP from United Nations (1990).~ There are missing data points for

'some international comparisons attempt to control for variations in labor and leisure by comparing levels of GDP per worker (or per work-hour). In effect, such comparisons are identjfying the marginal product of labor with the average product. Our procedure is preferable in that it identifies the marginal product with the wage.

the wage-share series and the hours-of-work series. We are interested in the extent to which adjustments for quantities and prices will affect the international-dollar ($1) rank- ings, so we minimize the resulting bias by ordering the observations in terms of $GDP and interpolating.

111. Results

Our results are summarized in Figure 2, which displays the painvise rankings, based on 38 categories of expenditure, in matrix form. Countries are listed in order of their GNP at international prices, as estimated by the ICP, except where the revealed-prefer- ence criteria used here indicate an unam- biguous reversal of the ICP ranking. Coun- tries with a rank-order different from that of the ICP are indicated by an asterisk.

A "+" sign in row i, column j indicates that i dominates j by revealed-preference criteria: a "-" indicates the reverse; NC indicates noncompariability by these crite- ria; "!!" indicates that the assumption of common tastes is violated. The use of > in column i, row j indicates that consumption of all categories of goods is higher in coun- try i than in country j, and vice versa for

<.3 As is observed by Stephen G. Bronars (1987), strict dominance means that there is no possibility of observing a violation of taste and, hence, that the test of the common-tastes assumption has zero power. It would be feasible, although highly computer-intensive, to use the Monte Carlo techniques proposed by Bronars to estimate the power of the test in cases where domi- nance does not apply.

The notation G+ indicates that A domi- nates B using the GARP, but not the WARP (G-denotes the converse). This arises in only two cases. Yugoslavia is directly re

"everal of the subcategories of investment. such as the change in foreign balance, create problems here, since they may take negative values. For this reason, in tabulations including both consumption and investment, aggregate investment has been used in testing for dominance.






$I GDP Ambiguity
Group Countries Rankings per capita within group
1 United States-Netherlands 2-9 11,447-9,320 54 percent
2 Finland-Spain 10-17 8,639-6,352 50 percent
3 Venezuela-Panama 19-29 5,429-3,186 40 percent
4 Korea-Dominican Republic 32-38 2,585-1,980 57 percent
5 Philippines-El Salvador 39-42 1,740-1,416 83 percent
6 Ivory Coast-Zimbabwe 43-51 1,368-895 47 percent
7 Senegal-India 53-55 687-570 100 percent

vealed preferred to Chile, which in turn is directly revealed preferred to Brazil. However,Yugoslavia is not directly compa- rable with Brazil. Similarly, when leisure is included, Indonesia dominates Zimbabwe via Nigeria.

As can be seen from Figure 2, there are three instances in which the ICP rankings may be reversed on the basis of revealed- preference criteria. All of the reversals involve Finland, which is ranked above Austria, the United Kingdom, and Italy on the ICP league table, but below these coun- tries on revealed preference. Finland has relatively low levels of private consumption and relatively high levels of investment and government spending. In all of the countries in this group, the price of private consump- tion is high relative to world prices. Hence the ICP index gives a lower weight to pri- vate consumption than do comparisons us- ing prices from Finland or the other three countries.

More generally, countries that are within three or four places of each other on the ICP "league table" can rarely be unambigu- ously ranked on the basis of revealed pref- erence. Nevertheless, most of the 1,700 painvise rankings are not altered by revealed-preference criteria. The $1 rank- ings do effectively separate countries that are fairly far apart. Typically any pair more than 10 places apart will be ranked unam- biguously on revealed preference. Figure 2 suggests that there are a number of fairly distinct groupings. The groups can be ranked relative to each other, but there is no clear ranking within each group. Table 1 lists seven such groupings. The final column shows the proportion of painvise compar- isons that are either ambiguous or are re- versed.

If the revealed-preference axioms are not employed, some comparisons may be made using only the assumption of nonsatiation. Strict dominance is fairly common when 10 categories of expenditure are used, but much less common when 38 categories are used. The main expenditure category in which poor countries consume more than rich ones is bread-a classic example of an inferior good. Poor countries with cheap bread tend to have high consumption and therefore tend not to be strictly dominated. The result is that the strict dominance criterion is in- sufficient to yield an unambiguous ranking even for pairs such as Canada (first on the ICP table) and Malawi (56th).

In addition to the analysis reported in Figure 2, the analysis was repeated both at a higher level of aggregation (10 categories of ex~enditure) and also with leisure included as a good. In Table 2 we summarize the results.

Ambiguities and noncomparabilities increase with the level of disaggregation. At the highest level of disaggregation, and tak- ing leisure into account, only five countries can be ranked unambiguously with respect to all the others in the sample, and of these five, there are three for which rankings are reversed relative to the $1 rankings, leaving only Canada at the top and Ethiopia at the bottom undisturbed in their original rank- ings.

More adjustments to the ICP rankings are required for the analysis incorporating leisure. Belgium, the Netherlands, and Italy



10 categories 38 categories 38 categories
Ranking category of expenditure of expenditure +leisure
Noncomparable 34 83 85
Reversal of international price 3 3 16
Unambiguous ranking relative 16 8 5
to all others      
Reject hypothesis of common tastes +WARP 2 0 0
Reject hypothesis of commonson tastes +GARP 2 0 0
Additional rankings derived 0 1 1
from GARP      

are all promoted, while Japan, Hong Kong, (1990) as a test criterion. Also, the WARP and Korea are all demoted. This reflects the violations disappear with disaggregation. fact that the ICP rankings take no account These results suggest that the data are con- of leisure. The three East Asian countries sistent with the hypothesis of common tastes. all have high participation rates and high An easy check of GARP is obtained using average hours per worker. Lemma 1. The ICP rankings, adjusted for

The results presented in Table 2 may also unambiguous reversals, provide the re-be used as a test of the joint hypotheses of quired ranking. It is simply necessary to common tastes and WARP. Using 10 ex- check that there are no minus signs below penditure categories, two pairs (Finland- the diagonal in Figure 2. The same test Austria and Nigeria-Zimbabwe) are found reveals no violation of GARP for the analy- to violate the WARP hypotheses. However, sis with leisure or with 10 categories (with with 38 expenditure categories, there are no the exception of the previously observed violations, whether or not leisure is in-violations of WARP). cluded. Varian (1990) proposes a test to The fact that the data on international determine the seriousness of the observed consumption levels are consistent with a WARP violations. He observes that, if two hypothesis of common tastes may seem sur- observations (A and B) exhibit a WARP prising. It is a commonplace assertion that violation. tastes differ widely between nations. It may

be that differences in tastes are too subtle to be picked up at the level of aggregation employed here. Alternatively, it is possible

provides a measure of the amount of that differences in consumption patterns be- "wasted" expenditure, that is, the smallest tween, say, Japan and Austria have been amount by which expenditure must be interpreted as evidence of different tastes changed for one of A,B to achieve consis- when in fact they represent adjustments to tency with WARP. differences in relative prices.

We find that the average Austrian could have purchased the Finnish bundle for 99.2 IV. Conclusions percent of average Austrian expenditure, implying a waste of 0.8 percent of expendi- Revealed-preference principles have been ture. For the pair Nigeria-Zimbabwe, the used to analyze ICP data for 1980 prices corresponding estimate is 0.4 percent of ex- and quantities of up to 38 components of penditure. In both cases; the waste is below GDP across 60 countries. We find that con- the level of 5 percent suggested by Varian stant price rankings are typically not confirmed by revealed-preference criteria when pairs of countries are within three or four places of each other, or when the measured proportional difference in per capita GDP is less than 10 percent. Indeed the rankings are, in these cases, sometimes reversed. On the other hand, rankings of groups of countries corresponding to "development groups" are confirmed by revealed-prefer- ence criteria, and in many cases by the even stricter criteria of dominance. Our addition of data on price and quantity of leisure results in some additional within-groups re- ranking.

These results suggest that constant-price measures of "real GDP" should be used with circumspection when comparisons are being made of countries at a similar level of development. In particular, we call into question the common practice of citing small movements up or down the international league tables as an indicator of policy suc- cess or fail~re.~This is not to call into question the usefulness of the ICP data and its application through the Penn World Ta- bles. Like many other researchers, we find the results of the ICP project indispensable in analyzing patterns of world economic de- velopment. Our point is merely to indicate the extent to which constant price rankings may not be indicative of welfare rankings when relative prices vary.

There are various dimensions which spring immediately to mind for further re- search. A deepening of the analysis through further disaggregation of commodity and service groups is desirable. We conjecture that the regions of ambiguity or noncompa- rability would increase. It would be interest- ing to test this conjecture and to measure the rate at which noncomparability increases with disaggregation in order to esti- mate its limit.

We have managed to surprise at least ourselves by failing to reject the hypothesis


contrast, the results of Manser and MacDonald (1988) show that time-series changes in GDP measured in a single country usually correspond to unambiguous improvements in welfare.

of common tastes in any of our painvise comparisons, except for two cases using the lowest level of disaggregation. We know that, for example, Japan's consumption pat- terns are very different from those of most European countries at a similar level of international-price GDP, and we expected our tests to reject common tastes for at least some comparisons. We argue that any Japanese preference for fish should be ex- pected to push up fish prices while a Ger- man predilection for meat should raise their meat prices, in which case we would expect to observe that the Japanese could afford the German consumption bundle, and vice versa, which would lead us to reject common tastes. The data, however, are quite consistent with the hypothesis that the choice between sushi and schnitzel is driven by relative prices.


