MODELS WITH NON-NORMAL DISTURBANCES

The purpose of this session is to show you how to estimate and test for models with non-normal disturbances.  In regression applications where the dependent variable is strictly positive, the assumption of normally distributed errors may be inappropriate. Other applications may involve truncation or censoring at the high end, or on both ends. The “optim” procedure and various "canned" procedures in R provide the ability to estimate linear regression models with a range of distributional assumptions.

Of course, the starting point in moving to an alternative estimator is theory. We should always define the nature of the statistical experiment that produced the sample. Ideally, there should be a close match between the data attributes and the attributes of the probability distribution chosen for estimation. For example, if the data are always positive but unbounded on the upper end, then one of the distributions derived from the Generalized Gamma Function might be appropriate. If the data are a logged form of a Cobb-Douglas type function, then we might suspect a LogNormal distribution of disturbances. The Beta distribution offers the widest range of possible forms, but this might be a limitation if we want to justify the use of a distribution based on substantive theory. Ideally, we would like to be able to derive the appropriate distribution based on substantive theory. However, this may not always be possible. In this exercise, we show how to implement an empirical strategy for choosing an appropriate non-normal distribution.

# This file demonstrates how to test and estimate regression models that have non-normal disturbances. We

# consider a range of models including the Generalized Gamma, Gamma, Exponential, Weibull, Beta, and

# LogNormal distributions.

# Below, let's create some simulated data that we know has a normal distribution. Then compute

# descriptives on the simulated data, including skewness, kurtosis, and quantile plots.

normdata <- rnorm(1000)

mean(normdata)

var(normdata)

sd(normdata)

median(normdata)

min(normdata)

max(normdata)

library(e1071)

skewness(normdata)

kurtosis(normdata)

# Note that the skew and kurtosis statisticsprovide evidence concerning the distribution

# of the data, relative to the normal distribution. The skewness statistic measures the

# symmetry of the distribution, while the kurtosis statistic measures the relative peakedness

# of the distibution compared to a normal distribution. A perfectly symmetrical distribution

# would have skewness=0; a non-kurtotic distribution would have kurtosis=3 (However, this library

# in R centers the kurtosis statistic on 0 by subtracting out the 3.

# Now let's do a quantile plot of the simulated data.

library(car)

qqPlot(normdata, dist= "norm", labels=FALSE)

# Quantile plots compare quantiles of the actual data to quantiles of the same range drawn from

# a normal distribution. This provides yet another indicator of the normality of the data. Normal

# data should follow the line on the quantile plot. A sigmoid shape indicates kurtosis, while the

# slope of the plot relative to the line implies a particular direction of skew. Take care, however,

# of relying too much on these plots and statistics. They pertain to the sample,

# not the population from which the sample is drawn. Theory is the best guide,

# particularly with small N where such plots may not be reliable.

# The next set of commands will read in some real data. Change the path to find the data.

data(Prestige, package="car")

attach(Prestige)

summary(Prestige)

# Now let's list the data, compute descriptive statistics, and do a qq plot for each variable.

list(Prestige)

mean(income)

var(income)

sd(income)

median(income)

min(income)

max(income)

skewness(income)

kurtosis(income)

qqPlot(income, dist= "norm", labels=FALSE)

mean(education)

var(education)

sd(education)

median(education)

min(education)

max(education)

skewness(education)

kurtosis(education)

qqPlot(education, dist= "norm", labels=FALSE)

mean(prestige)

var(prestige)

sd(prestige)

median(prestige)

min(prestige)

max(prestige)

skewness(prestige)

kurtosis(prestige)

qqPlot(prestige, dist= "norm", labels=FALSE)

# Now, let's estimate the relationship between income, education, and prestige using the standard

# regression model and test for normality

ols.model <- lm(income ~ education + prestige)

summary(ols.model)

skewness(residuals(ols.model))

kurtosis(residuals(ols.model))

qqPlot(residuals(ols.model), dist= "norm", labels=FALSE)

# histogram and density plots

hist(residuals(ols.model), probability=T, ylab='Density')

lines(density(residuals(ols.model)), lwd=2)

points(residuals(ols.model), rep(0, length(residuals(ols.model))), pch="|")

box()

# The residuals from this regression appear somewhat skewed and kurtotic. Based on theory and the

# truncation of income at zero we will assume some of the other distributions.

# First, let's assume a Generalized Gamma Distribution. This distribution encompasses a Gamma,

# Exponential, and Weibull distribution with some simple parameter restrictions, as shown below.

# It also encompasses a lognormal distribution with some mathematical manipulation,

# In this version of the Generalized Gamma there are two distributional parameters that we are

# labeling c and k.  Note that this is a slightly different parameterization relative to that

# given in the lecture.  relative to the lecture discussion of a Generalized Gamma with 4 parameters,

# we are restricting the location parameter to zero.  Note also that following

# estimation we save the log likelihood for later hypothesis testing.  */

# Because of the complexity of the function, I have used R's ability to use subfunctions.

# First, put the data into matrices for the MLE procedure

x <- cbind(1,education,prestige)

y <- as.matrix(income)

ones <- as.vector(x[,1])

# Calculate K (columns in x), define K1 and K2 for use below, and n for later use

K <- ncol(x)

K1 <- K+1

K2 <- K+2

n <- nrow(x)

# Define the function to be optimized

llik.ggamma <- function(par,X,Y) {

X <- as.matrix(x)

Y <- as.vector(y)

xbeta <- X%*%par[1:K]

c <- par[K1:K1]

k <- par[K2:K2]

cbx <- y/exp(xbeta)

cpk <- c+(1/k)

sum( log(k) - lgamma(c) + c*k*(lgamma(cpk) -  lgamma(c)  )

- log(Y) +c*k*log(cbx)

-( exp(lgamma(cpk)-lgamma(c))*(cbx)

)^k )

}

llik.ggamma

# Now let's use the above function to estimate the model. We'll start by setting initial values to 0 for the mean and 1

# for the distribution. Plug these in again, and reiterate. Also, compare these starting values to what was obtained in STATA.

mod.ggamma <- optim(c(7.71,-0.048,0.03,1,1),llik.ggamma, method = "BFGS", control = list(maxit=1000,fnscale = -1),

hessian = TRUE)

mod.ggamma

# Save the log likelihood for later use below

llggamma <- mod.ggamma\$value

llggamma

# Calculate the variance matrix from the Hessian matrix.

v <- -solve(mod.ggamma\$hessian)

# Calculate the standard errors from the variance matrix.

se <- sqrt( diag(v))

# Calculate the z statistics from the coefficients and standard errors

b <- mod.ggamma\$par

zstat <-b/se

# Calculate p values from the z statistics

pz <- 2*pnorm( -abs(zstat))

# Put the results together in a table.

table <- cbind(b,se,zstat,pz)

table

# Now let's assume a Gamma Distribution. This distribution is true when k=1 in the preceding

# Generalized Gamma Disribution. We plug this into the preceding function for the Gamma

# Define the function to be optimized

llik.gamma <- function(par,X,Y) {

X <- as.matrix(x)

Y <- as.vector(y)

xbeta <- X%*%par[1:K]

c <- par[K1:K1]

k <- 1

cbx <- y/exp(xbeta)

cpk <- c+(1/k)

sum( log(k) - lgamma(c) + c*k*(lgamma(cpk) -  lgamma(c)  )

- log(Y) +c*k*log(cbx)

-( exp(lgamma(cpk)-lgamma(c))*(cbx)

)^k )

}

llik.gamma

# Now let's use the above function to estimate the model.

# This time let's get starting values from the prior generalized gamma estimation

parvalues <- c(mod.ggamma\$par)

startvalues<- c(parvalues[1:3],1)

mod.gamma <- optim(startvalues,llik.gamma, method = "BFGS", control = list(maxit=1000,fnscale = -1),

hessian = TRUE)

mod.gamma

# Save the log likelihood for later use below

llgamma <- mod.gamma\$value

llgamma

# Calculate the variance matrix from the Hessian matrix.

v <- -solve(mod.gamma\$hessian)

# Calculate the standard errors from the variance matrix.

se <- sqrt( diag(v))

# Calculate the z statistics from the coefficients and standard errors

b <- mod.gamma\$par

zstat <-b/se

# Calculate p values from the z statistics

pz <- 2*pnorm( -abs(zstat))

# Put the results together in a table.

table <- cbind(b,se,zstat,pz)

table

# Now let's assume a Weibull Distribution. This distribution is true when c=1 in the preceding

# Generalized Gamma Distribution. Again,plug this into the function for the generalized gamma.

# Define the function to be optimized

llik.weibull <- function(par,X,Y) {

X <- as.matrix(x)

Y <- as.vector(y)

xbeta <- X%*%par[1:K]

c <- 1

k <- par[K1:K1]

cbx <- y/exp(xbeta)

cpk <- c+(1/k)

sum( log(k) - lgamma(c) + c*k*(lgamma(cpk) -  lgamma(c)  )

- log(Y) +c*k*log(cbx)

-( exp(lgamma(cpk)-lgamma(c))*(cbx)

)^k )

}

llik.weibull

# Now let's use the above function to estimate the model.

parvalues <- c(mod.ggamma\$par)

startvalues<- c(parvalues[1:3],1)

mod.weibull <- optim(c(startvalues),llik.weibull, method = "BFGS", control = list(maxit=1000,fnscale = -1),

hessian = TRUE)

mod.weibull

# Save the log likelihood for later use below

llweibull <- mod.weibull\$value

llweibull

# Calculate the variance matrix from the Hessian matrix.

v <- -solve(mod.weibull\$hessian)

# Calculate the standard errors from the variance matrix.

se <- sqrt( diag(v))

# Calculate the z statistics from the coefficients and standard errors

b <- mod.weibull\$par

zstat <-b/se

# Calculate p values from the z statistics

pz <- 2*pnorm( -abs(zstat))

# Put the results together in a table.

table <- cbind(b,se,zstat,pz)

table

# Now let's assume an Exponential Distribution. This distribution is true when c=k=1

# in the preceding Generalized Gamma Distribution.

# Define the function to be optimized

llik.exponential <- function(par,X,Y) {

X <- as.matrix(x)

Y <- as.vector(y)

xbeta <- X%*%par[1:K]

c <- 1

k <- 1

cbx <- y/exp(xbeta)

cpk <- c+(1/k)

sum( log(k) - lgamma(c) + c*k*(lgamma(cpk) -  lgamma(c)  )

- log(Y) +c*k*log(cbx)

-( exp(lgamma(cpk)-lgamma(c))*(cbx)

)^k )

}

llik.exponential

# Now let's use the above function to estimate the model.

parvalues <- c(mod.ggamma\$par)

startvalues<- c(parvalues[1:3])

mod.exponential <- optim(c(startvalues),llik.exponential, method = "BFGS", control = list(maxit=1000,fnscale = -1),

hessian = TRUE)

mod.exponential

# Save the log likelihood for later use below

llexponential <- mod.exponential\$value

llexponential

# Calculate the variance matrix from the Hessian matrix.

v <- -solve(mod.exponential\$hessian)

# Calculate the standard errors from the variance matrix.

se <- sqrt( diag(v))

# Calculate the z statistics from the coefficients and standard errors

b <- mod.exponential\$par

zstat <-b/se

# Calculate p values from the z statistics

pz <- 2*pnorm( -abs(zstat))

# Put the results together in a table.

table <- cbind(b,se,zstat,pz)

table

# Model Discrimination: Choosing the correct distribution is more of an

# art than a science. Of course, as we noted above, it would be

# best if there were a basis in substantive theory for the chosen model.

# Lacking this alternative, we can construct hypothesis tests based

# on Likelihood Ratio, Wald, or LaGrange Multiplier principles.

# For example, using a Likelihood Ratio test we could test each of the

# submodels against the Generalized Gamma model. The likelihood ratio

# statistic is chi squared with degrees of freedom equal to the number

# of parameter restrictions (1 for Generalized Gamma to Gamma and Weibull;

# 2 for Generalized Gamma to Exponential)

# Likelihood Ratio, Generalized Gamma to Gamma

llggamma

llgamma

llweibull

lrggg <- 2*(llggamma-llgamma)

lrggg

lrgggpvalue <- 1-pchisq(lrggg, df=1)

lrgggpvalue

# Likelihood Ratio, Generalized Gamma to Weibull

lrggw <- 2*(llggamma-llweibull)

lrggw

lrggwpvalue <- 1-pchisq(lrggw, df=1)

lrggwpvalue

# Likelihood Ratio, Generalized Gamma to Exponential

lrgge <- 2*(llggamma-llexponential)

lrgge

lrggepvalue <- 1-pchisq(lrgge, df=2)

lrggepvalue

# We can also test the Exponential Model against the Gamma and Weibull models.

# These likelihood ratio tests are chi squared with one degree of freedom.

# Likelihood Ratio, Weibull to Exponential

lrwe <- 2*(llweibull-llexponential)

lrwe

lrwepvalue <- 1-pchisq(lrwe, df=1)

lrwepvalue

# Likelihood Ratio, Gamma to Exponential

lrge <- 2*(llgamma-llexponential)

lrge

lrgepvalue <- 1-pchisq(lrge, df=1)

lrgepvalue

# On this basis we choose the Weibull model, since it is not statistically different

# than the Generalized Gamma model and more parsimonious (1 less parameter). The Gamma model

# is different, which suggests the need for the additional parameter in generalized gamma.

# Another distribution that is related to the Generalized Gamma is the LogNormal.

# However, it is not obtained by a simple change in parameterization as above. If a

# variable,y, is lognormal, then its variance is proportional to the square of its mean.

# Here is the setup for estimating the lognormal distribution from the preceding data.

llik.lognormal <- function(par,X,Y) {

Y <- as.vector(y)

X <- as.matrix(x)

xbeta <- X%*%par[1:K]

Sig <- par[K1:K1]

sum(-log(sqrt(2*pi)*Sig)-log(Y)-(1/(2*Sig^2))*(log(Y/exp(xbeta))+(Sig^2)/2)^2)

}

llik.lognormal

# Now let's use the above function to estimate the model. Get the start values from the prior generalized gamma.

startvalues <- c(mod.ggamma\$par[1:3],1)

mod.lognormal <- optim(c(startvalues),llik.lognormal, method = "BFGS", control = list(maxit=100,fnscale = -1),

hessian = TRUE)

mod.lognormal

# Calculate the variance matrix from the Hessian matrix.

v <- -solve(mod.lognormal\$hessian)

# Calculate the standard errors from the variance matrix.

se <- sqrt( diag(v))

# Calculate the z statistics from the coefficients and standard errors

b <- mod.lognormal\$par

zstat <-b/se

# Calculate p-values for the z statistics

pzstat <- 2* (1 - pnorm(abs(zstat)))

# Put the results together in a table.

table <- cbind(b,se,zstat,pzstat)

table

# When the dependent data are in proportions (i.e.,0<y<1 ) then a

# Beta Distribution is often used. If the data are not in proportions

# then it is possible to put them into proportions simply by dividing

# the dependent variable by its maximum. Of course, if you transform

# the data just to use the Beta, then the scaling and interpretation of

# the coefficients will be different, so it is probably better to just

# use the Beta with variables that are naturally proportions unless you have

# a good reason to want the Beta. The Beta Distribution is perhaps the most

# flexible distribution. Look again at the Excel spreadsheet for the Beta to

# get a sense of just how flexible it is. Of course, this flexibility may be

# its major drawback since it is often difficult to justify on the basis of

# substantive theory. (The above distributions may be also.) Here is the Beta

# distribution applied to cross-national data on infant mortality per 1000 as

# a function of national income and whether the nation is oil exporting.

attach(Leinhardt)

summary(Leinhardt)

# First, create the rate of infant mortality from the data. This can be thought of

# as the probability of one infant death per 1000 draws. This probability naturally

# ranges from 0 to 1.

infantmort <- infant/1000

# Next, compute the odds ratio for infant mortality. The link function is

# usually specified as an odds

infantodds <- exp(infantmort)/(1+exp(infantmort))

# Create a vector of ones for inclusion on the ml procedure

ones <- matrix(1:1,101,1)

# Now lets just do an ols to get start values and for comparison.

ols.model <- lm(infantodds ~ income + oilyes)

summary(ols.model)

library(e1071)

skewness(residuals(ols.model))

kurtosis(residuals(ols.model))

qqPlot(residuals(ols.model), dist= "norm", labels=FALSE)

hist(residuals(ols.model), probability=T, ylab='Density')

lines(density(residuals(ols.model)), lwd=2)

points(residuals(ols.model), rep(0, length(residuals(ols.model))), pch="|")

box()

# Define the function to be optimized

llik.beta <- function(par,ONES,X1,X2,YP) {

ONES <- as.vector(ones)

X1 <- as.vector(income)

X2 <- as.vector(oilyes)

YP <- as.vector(infantmort)

a0 <- par[1:1]

a1 <- par[2:2]

a2 <- par[3:3]

xbeta <- a0*ONES + a1*X1 + a2*X2

p <- par[4:4]

mu <- exp(xbeta)/(1+exp(xbeta))

sum(lgamma(p)-lgamma(mu*p)-lgamma((1-mu)*p)+(mu*p-1)*log(YP)+((1-mu)*p-1)*log(1-YP))

}

llik.beta

# Now let's use the above function to estimate the model. Use coefficients from the original

# regression as starting values, augmented by a 3. Why 3?

startvalues <- c(coefficients(ols.model),3)

startvalues

/* Now let's estimate the model */

mod.beta <- optim(c(startvalues),llik.beta, method = "BFGS", control = list(maxit=1000,fnscale = -1),

hessian = TRUE)

mod.beta

# Calculate the variance matrix from the Hessian matrix.

v <- -solve(mod.beta\$hessian)

# Calculate the standard errors from the variance matrix.

se <- sqrt( diag(v))

# Calculate the z statistics from the coefficients and standard errors

b <- mod.beta\$par

zstat <-b/se

# Calculate p values from the z statistics

pz <- 2*pnorm( -abs(zstat))

# Put the results together in a table.

table <- cbind(b,se,zstat,pz)

table

# Note that obtaining a good set of starting values is crucial to convergence or even

# initial iteration. B above in the vector of starting values is the vector of coefficients

# from the prior regression. Ferrari and Cribari-Neto (2004) recommend this as a good way

# to specify start values in beta regression. Note that the coefficients from this regression

# are interpreted as the log of the odds ratio between the new level and old level associated

# with the change in x.

#You might also want to try the user defined canned procedure called betareg.

library(betareg)

betareg.out <- betareg(infantmort ~ income + oilyes, link = "logit")

summary(betareg.out)