MODELS WITH
NON-NORMAL DISTURBANCES
The
purpose of this session is to show you how to estimate and test for models with
non-normal disturbances. In regression
applications where the dependent variable is strictly positive, the assumption
of normally distributed errors may be inappropriate. Other applications may
involve truncation or censoring at the high end, or on both ends. STATA enables
you to test normality assumptions, as well as to estimate models with
non-normal disturbances.
Of
course, the starting point in moving to an alternative estimator is theory. We
should always define the nature of the statistical experiment that produced the
sample. Ideally, there should be a close match between the data attributes and
the attributes of the probability distribution chosen for estimation. For
example, if the data are always positive but unbounded on the upper end, then
one of the distributions derived from the Generalized Gamma Function might be
appropriate. If the data are a logged form of a Cobb-Douglas type function,
then we might suspect a LogNormal
distribution of disturbances. The Beta distribution offers the widest range of
possible forms, but this might be a limitation if we want to justify the use of
a distribution based on substantive theory. Ideally, we would like to be able
to derive the appropriate distribution based on substantive theory. However,
this may not always be possible. In this exercise, we show how to implement an
empirical strategy for choosing an appropriate non-normal distribution.
/*
This file demonstrates how to test and estimate
regression models that have non-normal disturbances. We consider a range of
models including the Generalized Gamma, Gamma, Exponential, Weibull,
Beta, and LogNormal distributions. */
clear
set more off
/*
Below, let's create some simulated data that we know
has a normal distribution. Then compute descriptives
as above on the simulated data. Compare the skewness,
kurtosis, and quantile plots.*/
drawnorm x, n(5000)
sum x, detail
sktest x
qnorm x
/*
Note that the skew and kurtosis statistics, along with the quantile
plots provide evidence concerning the distribution of the data, relative to the
normal distribution. The skewness statistic measures
the symmetry of the distribution, while the kurtosis statistic measures the
relative peakedness of the distibution
compared to a normal distribution. A perfectly symmetrical distribution would
have skewness=0; a non-kurtotic
distribution
would have
kurtosis=3. The quantile
plot compares quantiles of the actual data to quantiles of the same range drawn from a normal
distribution. This provides yet another indicator of the normality of the data.
Normal data should follow the line on the quantile
plot. A sigmoid shape indicates kurtosis, while the slope of the plot relative
to the line implies a particular direction of skew. Take care, however, of
relying too much on these plots and statistics. They pertain to the sample, not
the population from which the sample is drawn. Theory is the best guide,
particularly with small N. Another helpful test in STATA is the “sktest” command.
In
this example, we are unable to reject the null hypothesis that the variable x
is normally distributed.*/
clear
/*
The next line will read in some real data. Change the
path to find data */
use "C:\users\Wood\Documents\My
Teaching\Maximum Likelihood\Data\prestige.dta", clear
/*
Now let's print the data and compute descriptive statistics. */
sum income education prestige,
detail
sktest income
education prestige
qnorm income
qnorm education
qnorm prestige
/*
Now, let's estimate the relationship between income, education, and prestige
using the standard regression model, save the residuals, and test for
normality.*/
reg income
education prestige
predict e, res
sum e, detail
sktest e
qnorm e
/*
The residuals from this regression appear skewed and kurtotic. However, N is fairly small. Nevertheless, based
on theory and the truncation of income at zero we will assume some of the other
distributions.*/
/*
First, let's assume a Generalized Gamma Distribution. This distribution
encompasses a Gamma, Exponential, and Weibull
distribution with some simple parameter restrictions, as shown below. It also encompasses a lognormal distribution
with some mathematical manipulation. In the Generalized Gamma there are two
distributional parameters that we are labeling c and k.
Note
that following estimation we save the log likelihood for later hypothesis
testing. */
set more off
program define gengam
version 7
args lnf theta1 k c
#
delimit ;
quietly replace `lnf'=
ln(`k')
-lngamma(`c')
+`c'*`k'*ln(exp(lngamma(`c'+(1/`k')))/exp(lngamma(`c')))
-ln($ML_y1)
+`c'*`k'*ln($ML_y1/exp(`theta1'))
-(exp(lngamma(`c'+(1/`k')))/exp(lngamma(`c'))
*($ML_y1/exp(`theta1'))) ^`k' ;
#delimit
cr
end
ml model lf gengam
(theta:income=education prestige) /k /c, technique(bfgs)
ml search
ml maximize
scalar llgg=e(ll)
display llgg
program drop gengam
/*
Now let's assume a Gamma Distribution. This distribution is true when k=1 in
the preceding Generalized Gamma Disribution.*/
set more off
program define gamma
version 7
args lnf theta1 c
#
delimit ;
quietly replace
`lnf'=-lngamma(`c')-ln($ML_y1)+`c'*ln($ML_y1/exp(`theta1'))-($ML_y1/exp(`theta1'))
;
#delimit
cr
end
ml model lf gamma (theta:income=education
prestige) /c
ml search
ml maximize
scalar llg=e(ll)
display llg
program drop gamma
/*
Now let's assume a Weibull Distribution. This
distribution is true when c=1 in the *preceding Generalized Gamma
Distribution.*/
set more off
program define weibull
version 7
args lnf theta1 k
#
delimit ;
quietly replace `lnf'=
ln(`k')
+`k'*ln(exp(lngamma(1+(1/`k'))))
-ln($ML_y1)
+`k'*ln($ML_y1/exp(`theta1'))
-((exp(lngamma(1+(1/`k'))))
*($ML_y1/exp(`theta1'))) ^`k' ;
#delimit
cr
end
ml model lf weibull (theta:income=education
prestige) /k
ml search
ml maximize
scalar llw=e(ll)
display llw
program drop weibull
*
Now let's assume an Exponential Distribution. This distribution is true when
c=k=1 in *the preceding Generalized Gamma Distribution.
set more off
program define expon
version 7
args lnf theta1
quietly replace `lnf'=-ln(exp(`theta1'))-($ML_y1/exp(`theta1'))
end
ml model lf expon
(theta:income=education prestige)
ml search
ml maximize
scalar lle=e(ll)
display lle
program drop expon
/*
Model Discrimination: Choosing the correct distribution is more of an art than
a science. Of course, as we noted above, it would be best if there were a basis
in substantive theory for the chosen model. Lacking this alternative, we can
construct hypothesis tests based on Likelihood Ratio, Wald, or LaGrange
Multiplier principles. For example, using a Likelihood Ratio test we could test
each of the submodels against the Generalized Gamma
model. The likelihood ratio statistic is chi squared with degrees of freedom
equal to the number of parameter restrictions (1 for Generalized Gamma to
Gamma and Weibull;
2 for Generalized Gamma to Exponential). */
*Generalized
Gamma to Gamma
scalar lrggg=2*(llgg-llg)
display lrggg
*Generalized
Gamma to Weibull
scalar lrggw=2*(llgg-llw)
display lrggw
*Generalized
Gamma to Exponential
scalar lrgge=2*(llgg-lle)
display lrgge
/*
We could also test the Exponential Model against the
Gamma and Weibull models. These likelihood ratio
tests are chi squared with one degree of freedom. */
*
Weibull to Exponential
scalar lrwe=2*(llw-lle)
display lrwe
*
Gamma to Exponential
scalar lrge=2*(llg-lle)
display lrge
/*
On this basis we choose the Weibull model above,
since it is not statistically different than the Generalized Gamma model and is
more parsimonious (1 less parameter). However, the Weibull
model is statistically different than the Exponential which has one less
parameter. This suggests that the parameter restriction for the Exponential
model is inappropriate. */
/*
Another distribution that is related to the
Generalized Gamma is the LogNormal. However, it is
not obtained by a simple change in parameterization as above. If a variable,y, is lognormal, then its
variance is proportional to the square of its mean. The LogNormal
procedure is given below. */
set more off
program define lognorm
version 7
args lnf theta1 sigma
#
delimit ;
quietly replace
`lnf'=-log(sqrt(2*_pi)*`sigma')-log($ML_y1)-(1/(2*`sigma'^2))*(log($ML_y1/(exp(`theta1')))
+(`sigma'^2)/2)^2 ;
#delimit
cr
end
ml model lf lognorm
(theta:income=education prestige) (sigma:)
ml search
ml maximize
program drop lognorm
/*
When
the dependent data are in proportions (i.e.,0<y<1
) then a Beta Distribution is often used. If the data are not in proportions
then it is possible to put them into proportions simply by dividing the
dependent variable by its maximum. Of course, if you transform the data just to
use the Beta, then the scaling and interpretation of the coefficients will be
different, so it is probably better to just use the Beta with variables that
are naturally proportions unless you have a good reason to want the Beta. The
Beta Distribution is perhaps the most flexible distribution. Look again at the
Excel spreadsheet for the Beta to get a sense of just how flexible it is. Of
course, this flexibility may be its major drawback since it is often difficult
to justify on the basis of substantive theory. (The above distributions may be
also.) Here is the Beta distribution applied to cross-national data on infant
mortality per 1000 as a function of national income and whether the nation is
oil exporting. */
clear
/*
The next line will read in some real data. Change the
path to find data */
use "C:\users\Wood\Documents\My
Teaching\Maximum Likelihood\Data\Leinhardt.dta", clear
/*
First, create the rate of infant mortality from the data. This can be thought
of as the probability of one infant death per 10000draws. This probability
naturally ranges from 0 to 1.*/
gen infantmort
= infant / 1000
/*
Next, get the odds ratio for infant mortality */
gen infantodds
= exp(infantmort)/(1+exp(infantmort))
/*
Now let's print the data and compute descriptive statistics. */
sum infant infantodds
income oilyes, detail
sktest infantodds
qnorm infantodds
/*
Run a regression to get some starting values */
reg infantodds income oilyes
/*Write
down the function to be optimized. */
set more off
program define beta
version 7
args lnf theta1 p
#
delimit ;
quietly replace `lnf'=lngamma(`p')
-lngamma(`p'*`theta1')
-lngamma((1-`theta1')*`p')
+(`theta1'*`p'-1)*ln($ML_y1)
+((1-`theta1')*`p'-1)*ln(1-$ML_y1) ;
#delimit
cr
end
ml model lf beta (theta:infantodds=income oilyes) (p:)
ml init -5e-06 .02 .53 3, copy
ml maximize
program drop beta
/*
Note that obtaining a good set of starting values is crucial to convergence or
even initial iteration. The OLS start values are recommended by Ferrari and Cribari-Neto (1994). They seem to work here.*/
/*Now
that I've put you through the pain of programming your own log likelihood
functions for these continuous non-normal distributions, I want to point out
that many of these distributions are available as canned procedures, and as
variants of the Generalized Linear Model features of
STATA. Generalized Linear Models are a framework
that encompasses many models that can be estimated with maximum
likelihood. It is a very flexible
framework presenting a unified way to view many families of distributions and
diverse link functions. However, GLM is
beyond the scope of this course
Here
are examples of STATA's canned non-normal procedures. */
/*
This first one is beta regression and requires the
installation of a package called betafit */
betafit infantmort, muvar(income oilyes)
clear
/*
The next line will read in the earlier data. Change
the path to find data */
use "C:\users\Wood\Documents\My
Teaching\Maximum Likelihood\Data\prestige.dta", clear
gamma income
education prestige
weibull income
education prestige
ereg income
education prestige
/*As an example of GLM, here is the syntax for estimating the
exponential regression model we did earlier. Note that standard errors are
radically different from above because of the different methods of estimating
the covariance matrix of
estimates. However, the log likelihood
and coefficient estimates are quite close.
*/
glm income
education prestige, fam(gamma) link(id)
exit