An early version of the
paper can be found at CEP/LSE
(and an even earlier version at
In this page you can find the data set used in
the paper, codes to extend some of the results in the paper, and other useful
information on the implementation of the PPML estimator.
If you cannot find here the
answer to your question about The Log of Gravity, please do not hesitate to contact the
authors; we will be only too pleased to help.
The dataset (in xls and dta formats) is available
We are aware of two situations where you may find that (at
first) the PPML estimator has trouble to converge. Fortunately, there are
simple solutions for both cases:
i) You may have problems using PPML when the dependent variable has many
zeros and the model includes many dummies. A typical example of this is when
panel data is used to estimate gravity equations including time-varying
importer and exporter fixed effects. A simple explanation for why this may
lead to convergence problems can be found
Letters paper provides the more technical details. To by-pass the problem,
see the ppml command below. ii) Stata's poisson command is not very good at dealing with numerical
problems and the algorithm may not converge either if the dependent variable
has very large values or if the regressors are highly collinearity or have
different scales. In this case there are several possible solutions: a) you
may be able to solve the problem just by re-scaling or re-centring the
problematic variables; b) rather than using the poisson command you can use
the ppml command that we have written (see below); c) of course, you can just use
software; one of us is a great fan of TSP.
Further details on convergence issues can be found in this Stata
Journal paper(anearlier version is availablehere).
We have written a ppml command for Stata that bypasses
most of these problems. To install the
command just type "findit ppml" (or "net sj 11-2") in Stata and follow the links; please read the
help file carefully before using the command.
RESET test: Here is a
sample of the code to perform the test.
If you want to compute the R-squared for a
model estimated by PPML, you can used the method implemented
PPML performance with many zeros:
Simulation evidence on the excellent performance of the PPML estimator
when the data has many zeros can be found in this
Be advised that there are several papers purporting to introduce estimators that improve on the PPML. While it is
of course possible to find estimators that outperform PPML in very specific conditions, and under reasonably strong distributional
assumptions, to our knowledge, all the proposed alternatives to PPML are either simply invalid or valid only under implausibly
strong distributional assumptions. Therefore we stand by the claim that PPML has all the characteristics needed to be the workhorse
for the estimation of constant-elasticity models such as the gravity equation. If you believe to have evidence that another estimator
generally outperforms PPML in this context please do let us know; we would be delighted to acknowledge that.
We have written a short
reply to "The
log of gravity revisited".
you want to compute 'undertrading' and 'overtrading' after fixed-effects
regressions with panel data, you need to obtain a set of residuals with zero
mean. Here is how to do it.
Thibault Fally has recently provided an additional motivation for using PPML; his interesting paper
can be seen here.
Paulo Guimarães has recently written a Stata add-on that can be useful if you want to use PPML to estimate a model with importer
and exporter dummies. To install type: "ssc install poi2hdfe".
Our ppml command for Stata does not have an option to include country-pair fixed effects; if you really want to include them you may want to consider
Timothy Simcoe's xtpqml Stata command.
Eight FAQ's & myths about the Log of Gravity
1 - Why can't we just use the log-linear model with robust/clustered standard errors?
- Using robust/clustered standard errors will only affect the estimated standard errors, but will
have no effect at all on the estimates of the parameters. Therefore, the log-linear model will generally
be invalid with or without the robust/clustered standard errors. PPML delivers estimates of the parameters
that are consistent under very general conditions; of course, robust/clustered standard errors should also
be used with PPML.
2 - Does PPML perform poorly when the
data has a large proportion of zeros?
- No, this is an unfortunate myth! In this paper, we have provided
ample evidence that the estimator works very well even when the proportion
of zeros is very large.
3 - Does PPML assume equi-dispersion?
- Not at all! This is another unfortunate myth. PPML is optimal when the conditional variance is
proportional (not necessarily equal) to the conditional mean. This allows
both for under- and over-dispersion. Even if the conditional variance is not
proportional to the conditional mean, the PPML will still be consistent.
4 - How can a continuous variable like
exports have a Poisson distribution?
- It can not! However, PPML does not require the data to follow a
Poisson distribution (that is why it is a pseudo-maximum likelihood estimator
and not a maximum likelihood estimator). In fact, all that is needed for the
PPML estimator to be consistent is that the conditional mean of the variate
of interest is correctly specified.
5 - Why use a count data estimator when
the dependent variable is continuous?
- The estimator proposed in the Log of Gravity is simply a weighted
non-linear least squares estimator. It turns out that the with the proposed
weights, the first-order conditions for this estimator are identical to
those of the Poisson pseudo-maximum likelihood regression. Therefore, the
fact that we recommend the used of a count data estimator for the gravity
equation is just a fortunate coincidence that allows the use of a well-known
regression method which is widely available is econometric and statistics
6 - Why not use other count data models
like the negative-binomial or zero inflated models?
- As noted in 5 above, the fact that we recommended that gravity
equations should be estimated with a count data model is essentially a
coincidence and there is no guarantee that other count data models would be
adequate to estimate gravity equations. For example, both the
negative-binomial and the zero-inflated regression models have the important
drawback of not being invariant to the scale of the dependent variable. That
is, measuring trade in dollars or in thousands of dollars will lead to
different estimates of the elasticities of interest!
7 - Can Vuong's tests for non-nested
hypotheses be used to choose between the PPML and the ZIP estimators?
- No! First of all, as noted in 6 above, it makes little
estimate a gravity equation using a zero-inflated model. In any
case, Vuong's test is appropriate to compare models estimated by maximum
likelihood, and it cannot be used when at least one of the competing
is estimated by other method. Therefore, the test cannot be used
when one of
the models is estimated by pseudo-maximum likelihood, as in the case
8 - Can PPML be used when there is
over-dispersion in the data?
- Yes! As noted in 3 above, PPML is consistent, and can even be
optimal, when there is under- or over-dispersion. Actually, it does not even
make much sense to test for over-dispersion when estimating by PPML because
the estimator makes no assumption about it.
According to Web-Counter, there have been unique visitors to this page since December 15, 2008