/* This file demonstrates how to test and estimate regression models that have non-normal disturbances

MODELS WITH NON-NORMAL DISTURBANCES

The purpose of this session is to show you how to estimate and test for models with non-normal disturbances. In regression applications where the dependent variable is strictly positive, the assumption of normally distributed errors may be inappropriate. Other applications may involve truncation or censoring at the high end, or on both ends. STATA enables you to test normality assumptions, as well as to estimate models with non-normal disturbances.

Of course, the starting point in moving to an alternative estimator is theory. We should always define the nature of the statistical experiment that produced the sample. Ideally, there should be a close match between the data attributes and the attributes of the probability distribution chosen for estimation. For example, if the data are always positive but unbounded on the upper end, then one of the distributions derived from the Generalized Gamma Function might be appropriate. If the data are a logged form of a Cobb-Douglas type function, then we might suspect a LogNormal distribution of disturbances. The Beta distribution offers the widest range of possible forms, but this might be a limitation if we want to justify the use of a distribution based on substantive theory. Ideally, we would like to be able to derive the appropriate distribution based on substantive theory. However, this may not always be possible. In this exercise, we show how to implement an empirical strategy for choosing an appropriate non-normal distribution.

/* This file demonstrates how to test and estimate regression models that have non-normal disturbances. We consider a range of models including the Generalized Gamma, Gamma, Exponential, Weibull, Beta, and LogNormal distributions. */

clear

set more off

/* Below, let's create some simulated data that we know has a normal distribution. Then compute descriptives as above on the simulated data. Compare the skewness, kurtosis, and quantile plots.*/

drawnorm x, n(5000)

sum x, detail

sktest x

qnorm x

/* Note that the skew and kurtosis statistics, along with the quantile plots provide evidence concerning the distribution of the data, relative to the normal distribution. The skewness statistic measures the symmetry of the distribution, while the kurtosis statistic measures the relative peakedness of the distibution compared to a normal distribution. A perfectly symmetrical distribution would have skewness=0; a non-kurtotic distribution

would have kurtosis=3. The quantile plot compares quantiles of the actual data to quantiles of the same range drawn from a normal distribution. This provides yet another indicator of the normality of the data. Normal data should follow the line on the quantile plot. A sigmoid shape indicates kurtosis, while the slope of the plot relative to the line implies a particular direction of skew. Take care, however, of relying too much on these plots and statistics. They pertain to the sample, not the population from which the sample is drawn. Theory is the best guide, particularly with small N. Another helpful test in STATA is the “sktest” command.

In this example, we are unable to reject the null hypothesis that the variable x is normally distributed.*/

clear

/* The next line will read in some real data. Change the path to find data */

use "C:\users\Wood\Documents\My Teaching\Maximum Likelihood\Data\prestige.dta", clear

/* Now let's print the data and compute descriptive statistics. */

sum income education prestige, detail

sktest income education prestige

qnorm income

qnorm education

qnorm prestige

/* Now, let's estimate the relationship between income, education, and prestige using the standard regression model, save the residuals, and test for normality.*/

reg income education prestige

predict e, res

sum e, detail

sktest e

qnorm e

/* The residuals from this regression appear skewed and kurtotic. However, N is fairly small. Nevertheless, based on theory and the truncation of income at zero we will assume some of the other distributions.*/

/* First, let's assume a Generalized Gamma Distribution. This distribution encompasses a Gamma, Exponential, and Weibull distribution with some simple parameter restrictions, as shown below. It also encompasses a lognormal distribution with some mathematical manipulation. In the Generalized Gamma there are two distributional parameters that we are labeling c and k.

Note that following estimation we save the log likelihood for later hypothesis testing. */

set more off

program define gengam

version 7

args lnf theta1 k c

# delimit ;

quietly replace `lnf'=

ln(`k')

-lngamma(`c')

+`c'*`k'*ln(exp(lngamma(`c'+(1/`k')))/exp(lngamma(`c')))

-ln($ML_y1)

+`c'*`k'*ln($ML_y1/exp(`theta1'))

-(exp(lngamma(`c'+(1/`k')))/exp(lngamma(`c'))

*($ML_y1/exp(`theta1'))) ^`k' ;

#delimit cr

end

ml model lf gengam (theta:income=education prestige) /k /c, technique(bfgs)

ml search

ml maximize

scalar llgg=e(ll)

display llgg

program drop gengam

/* Now let's assume a Gamma Distribution. This distribution is true when k=1 in the preceding Generalized Gamma Disribution.*/

set more off

program define gamma

version 7

args lnf theta1 c

# delimit ;

quietly replace `lnf'=-lngamma(`c')-ln($ML_y1)+`c'*ln($ML_y1/exp(`theta1'))-($ML_y1/exp(`theta1')) ;

#delimit cr

end

ml model lf gamma (theta:income=education prestige) /c

ml search

ml maximize

scalar llg=e(ll)

display llg

program drop gamma

/* Now let's assume a Weibull Distribution. This distribution is true when c=1 in the *preceding Generalized Gamma Distribution.*/

set more off

program define weibull

version 7

args lnf theta1 k

# delimit ;

quietly replace `lnf'=

ln(`k')

+`k'*ln(exp(lngamma(1+(1/`k'))))

-ln($ML_y1)

+`k'*ln($ML_y1/exp(`theta1'))

-((exp(lngamma(1+(1/`k'))))

*($ML_y1/exp(`theta1'))) ^`k' ;

#delimit cr

end

ml model lf weibull (theta:income=education prestige) /k

ml search

ml maximize

scalar llw=e(ll)

display llw

program drop weibull

* Now let's assume an Exponential Distribution. This distribution is true when c=k=1 in *the preceding Generalized Gamma Distribution.

set more off

program define expon

version 7

args lnf theta1

quietly replace `lnf'=-ln(exp(`theta1'))-($ML_y1/exp(`theta1'))

end

ml model lf expon (theta:income=education prestige)

ml search

ml maximize

scalar lle=e(ll)

display lle

program drop expon

/* Model Discrimination: Choosing the correct distribution is more of an art than a science. Of course, as we noted above, it would be best if there were a basis in substantive theory for the chosen model. Lacking this alternative, we can construct hypothesis tests based on Likelihood Ratio, Wald, or LaGrange Multiplier principles. For example, using a Likelihood Ratio test we could test each of the submodels against the Generalized Gamma model. The likelihood ratio statistic is chi squared with degrees of freedom equal to the number of parameter restrictions (1 for Generalized Gamma to

Gamma and Weibull; 2 for Generalized Gamma to Exponential). */

*Generalized Gamma to Gamma

scalar lrggg=2*(llgg-llg)

display lrggg

*Generalized Gamma to Weibull

scalar lrggw=2*(llgg-llw)

display lrggw

*Generalized Gamma to Exponential

scalar lrgge=2*(llgg-lle)

display lrgge

/* We could also test the Exponential Model against the Gamma and Weibull models. These likelihood ratio tests are chi squared with one degree of freedom. */

* Weibull to Exponential

scalar lrwe=2*(llw-lle)

display lrwe

* Gamma to Exponential

scalar lrge=2*(llg-lle)

display lrge

/* On this basis we choose the Weibull model above, since it is not statistically different than the Generalized Gamma model and is more parsimonious (1 less parameter). However, the Weibull model is statistically different than the Exponential which has one less parameter. This suggests that the parameter restriction for the Exponential model is inappropriate. */

/* Another distribution that is related to the Generalized Gamma is the LogNormal. However, it is not obtained by a simple change in parameterization as above. If a variable,y, is lognormal, then its variance is proportional to the square of its mean. The LogNormal procedure is given below. */

set more off

program define lognorm

version 7

args lnf theta1 sigma

# delimit ;

quietly replace `lnf'=-log(sqrt(2*_pi)*`sigma')-log($ML_y1)-(1/(2*`sigma'^2))*(log($ML_y1/(exp(`theta1')))

+(`sigma'^2)/2)^2 ;

#delimit cr

end

ml model lf lognorm (theta:income=education prestige) (sigma:)

ml search

ml maximize

program drop lognorm

When the dependent data are in proportions (i.e.,0<y<1 ) then a Beta Distribution is often used. If the data are not in proportions then it is possible to put them into proportions simply by dividing the dependent variable by its maximum. Of course, if you transform the data just to use the Beta, then the scaling and interpretation of the coefficients will be different, so it is probably better to just use the Beta with variables that are naturally proportions unless you have a good reason to want the Beta. The Beta Distribution is perhaps the most flexible distribution. Look again at the Excel spreadsheet for the Beta to get a sense of just how flexible it is. Of course, this flexibility may be its major drawback since it is often difficult to justify on the basis of substantive theory. (The above distributions may be also.) Here is the Beta distribution applied to cross-national data on infant mortality per 1000 as a function of national income and whether the nation is oil exporting. */

clear

/* The next line will read in some real data. Change the path to find data */

use "C:\users\Wood\Documents\My Teaching\Maximum Likelihood\Data\Leinhardt.dta", clear

/* First, create the rate of infant mortality from the data. This can be thought of as the probability of one infant death per 10000draws. This probability naturally ranges from 0 to 1.*/

gen infantmort = infant / 1000

/* Next, get the odds ratio for infant mortality */

gen infantodds = exp(infantmort)/(1+exp(infantmort))

/* Now let's print the data and compute descriptive statistics. */

sum infant infantodds income oilyes, detail

sktest infantodds

qnorm infantodds

/* Run a regression to get some starting values */

reg infantodds income oilyes

/*Write down the function to be optimized. */

set more off

program define beta

version 7

args lnf theta1 p

# delimit ;

quietly replace `lnf'=lngamma(`p')

-lngamma(`p'*`theta1')

-lngamma((1-`theta1')*`p')

+(`theta1'*`p'-1)*ln($ML_y1)

+((1-`theta1')*`p'-1)*ln(1-$ML_y1) ;

#delimit cr

end

ml model lf beta (theta:infantodds=income oilyes) (p:)

ml init -5e-06 .02 .53 3, copy

ml maximize

program drop beta

/* Note that obtaining a good set of starting values is crucial to convergence or even initial iteration. The OLS start values are recommended by Ferrari and Cribari-Neto (1994). They seem to work here.*/

/*Now that I've put you through the pain of programming your own log likelihood functions for these continuous non-normal distributions, I want to point out that many of these distributions are available as canned procedures, and as variants of the Generalized Linear Model features of

STATA. Generalized Linear Models are a framework that encompasses many models that can be estimated with maximum likelihood. It is a very flexible framework presenting a unified way to view many families of distributions and diverse link functions. However, GLM is beyond the scope of this course

Here are examples of STATA's canned non-normal procedures. */

/* This first one is beta regression and requires the installation of a package called betafit */

betafit infantmort, muvar(income oilyes)

clear

/* The next line will read in the earlier data. Change the path to find data */

use "C:\users\Wood\Documents\My Teaching\Maximum Likelihood\Data\prestige.dta", clear

gamma income education prestige

weibull income education prestige

ereg income education prestige

/*As an example of GLM, here is the syntax for estimating the exponential regression model we did earlier. Note that standard errors are radically different from above because of the different methods of estimating

the covariance matrix of estimates. However, the log likelihood and coefficient estimates are quite close. */

glm income education prestige, fam(gamma) link(id)

exit