MODELS WITH
NON-NORMAL DISTURBANCES
The
purpose of this session is to show you how to estimate and test for models with
non-normal disturbances. In regression
applications where the dependent variable is strictly positive, the assumption
of normally distributed errors may be inappropriate. Other applications may
involve truncation or censoring at the high end, or on both ends. The “optim”
procedure and various "canned" procedures in R provide the ability to
estimate linear regression models with a range of distributional assumptions.
Of
course, the starting point in moving to an alternative estimator is theory. We
should always define the nature of the statistical experiment that produced the
sample. Ideally, there should be a close match between the data attributes and
the attributes of the probability distribution chosen for estimation. For
example, if the data are always positive but unbounded on the upper end, then
one of the distributions derived from the Generalized Gamma Function might be
appropriate. If the data are a logged form of a Cobb-Douglas type function,
then we might suspect a LogNormal distribution of disturbances. The Beta
distribution offers the widest range of possible forms, but this might be a
limitation if we want to justify the use of a distribution based on substantive
theory. Ideally, we would like to be able to derive the appropriate
distribution based on substantive theory. However, this may not always be
possible. In this exercise, we show how to implement an empirical strategy for
choosing an appropriate non-normal distribution.
# This file demonstrates how to test and estimate regression models
that have non-normal disturbances. We
# consider a range of models including the Generalized Gamma, Gamma,
Exponential, Weibull, Beta, and
# LogNormal distributions.
# Below, let's create some simulated data that we know has a normal
distribution. Then compute
# descriptives on the simulated data, including skewness, kurtosis, and
quantile plots.
normdata <- rnorm(1000)
mean(normdata)
var(normdata)
sd(normdata)
median(normdata)
min(normdata)
max(normdata)
library(e1071)
skewness(normdata)
kurtosis(normdata)
# Note that the skew and kurtosis statisticsprovide evidence concerning
the distribution
# of the data, relative to the normal distribution. The skewness
statistic measures the
# symmetry of the distribution, while the kurtosis statistic measures
the relative peakedness
# of the distibution compared to a normal distribution. A perfectly
symmetrical distribution
# would have skewness=0; a non-kurtotic distribution would have
kurtosis=3 (However, this library
# in R centers the kurtosis statistic on 0 by subtracting out the 3.
# Now let's do a quantile plot of the simulated data.
library(car)
qqPlot(normdata, dist= "norm", labels=FALSE)
# Quantile plots compare quantiles of the actual data to quantiles of
the same range drawn from
# a normal distribution. This provides yet another indicator of the
normality of the data. Normal
# data should follow the line on the quantile plot. A sigmoid shape
indicates kurtosis, while the
# slope of the plot relative to the line implies a particular direction
of skew. Take care, however,
# of relying too much on these plots and statistics. They pertain to
the sample,
# not the population from which the sample is drawn. Theory is the best
guide,
# particularly with small N where such plots may not be reliable.
# The next set of commands will read in some real data. Change the path
to find the data.
data(Prestige, package="car")
attach(Prestige)
summary(Prestige)
# Now let's list the data, compute descriptive statistics, and do a qq
plot for each variable.
list(Prestige)
mean(income)
var(income)
sd(income)
median(income)
min(income)
max(income)
skewness(income)
kurtosis(income)
qqPlot(income, dist= "norm", labels=FALSE)
mean(education)
var(education)
sd(education)
median(education)
min(education)
max(education)
skewness(education)
kurtosis(education)
qqPlot(education, dist= "norm", labels=FALSE)
mean(prestige)
var(prestige)
sd(prestige)
median(prestige)
min(prestige)
max(prestige)
skewness(prestige)
kurtosis(prestige)
qqPlot(prestige, dist= "norm", labels=FALSE)
# Now, let's estimate the relationship between income, education, and
prestige using the standard
# regression model and test for normality
ols.model <- lm(income ~ education + prestige)
summary(ols.model)
skewness(residuals(ols.model))
kurtosis(residuals(ols.model))
qqPlot(residuals(ols.model), dist= "norm", labels=FALSE)
# histogram and density plots
hist(residuals(ols.model), probability=T, ylab='Density')
lines(density(residuals(ols.model)), lwd=2)
points(residuals(ols.model), rep(0, length(residuals(ols.model))),
pch="|")
box()
lines(density(residuals(ols.model), adjust=.5), lwd=1)
# The residuals from this regression appear somewhat skewed and
kurtotic. Based on theory and the
# truncation of income at zero we will assume some of the other
distributions.
# First, let's assume a Generalized Gamma Distribution. This
distribution encompasses a Gamma,
# Exponential, and Weibull distribution with some simple parameter
restrictions, as shown below.
# It also encompasses a lognormal distribution with some mathematical
manipulation,
# In this version of the Generalized Gamma there are two distributional
parameters that we are
# labeling c and k. Note that
this is a slightly different parameterization relative to that
# given in the lecture. relative
to the lecture discussion of a Generalized Gamma with 4 parameters,
# we are restricting the location parameter to zero. Note also that following
# estimation we save the log likelihood for later hypothesis
testing. */
# Because of the complexity of the function, I have used R's ability to
use subfunctions.
# First, put the data into matrices for the MLE procedure
x <- cbind(1,education,prestige)
y <- as.matrix(income)
ones <- as.vector(x[,1])
# Calculate K (columns in x), define K1 and K2 for use below, and n for
later use
K <- ncol(x)
K1 <- K+1
K2 <- K+2
n <- nrow(x)
# Define the function to be optimized
llik.ggamma <- function(par,X,Y) {
X <- as.matrix(x)
Y <- as.vector(y)
xbeta <- X%*%par[1:K]
c <- par[K1:K1]
k <- par[K2:K2]
cbx <- y/exp(xbeta)
cpk <- c+(1/k)
sum( log(k) - lgamma(c) +
c*k*(lgamma(cpk) - lgamma(c) )
- log(Y)
+c*k*log(cbx)
-(
exp(lgamma(cpk)-lgamma(c))*(cbx)
)^k )
}
llik.ggamma
# Now let's use the above function to estimate the model. We'll start
by setting initial values to 0 for the mean and 1
# for the distribution. Plug these in again, and reiterate. Also,
compare these starting values to what was obtained in STATA.
mod.ggamma <- optim(c(7.71,-0.048,0.03,1,1),llik.ggamma, method =
"BFGS", control = list(maxit=1000,fnscale = -1),
hessian = TRUE)
mod.ggamma
# Save the log likelihood for later use below
llggamma <- mod.ggamma$value
llggamma
# Calculate the variance matrix from the Hessian matrix.
v <- -solve(mod.ggamma$hessian)
# Calculate the standard errors from the variance matrix.
se <- sqrt( diag(v))
# Calculate the z statistics from the coefficients and standard errors
b <- mod.ggamma$par
zstat <-b/se
# Calculate p values from the z statistics
pz <- 2*pnorm( -abs(zstat))
# Put the results together in a table.
table <- cbind(b,se,zstat,pz)
table
# Now let's assume a Gamma Distribution. This distribution is true when
k=1 in the preceding
# Generalized Gamma Disribution. We plug this into the preceding
function for the Gamma
# Define the function to be optimized
llik.gamma <- function(par,X,Y) {
X <- as.matrix(x)
Y <- as.vector(y)
xbeta <- X%*%par[1:K]
c <- par[K1:K1]
k <- 1
cbx <- y/exp(xbeta)
cpk <- c+(1/k)
sum( log(k) - lgamma(c) +
c*k*(lgamma(cpk) - lgamma(c) )
- log(Y)
+c*k*log(cbx)
-(
exp(lgamma(cpk)-lgamma(c))*(cbx)
)^k )
}
llik.gamma
# Now let's use the above function to estimate the model.
# This time let's get starting values from the prior generalized gamma
estimation
parvalues <- c(mod.ggamma$par)
startvalues<- c(parvalues[1:3],1)
mod.gamma <- optim(startvalues,llik.gamma, method =
"BFGS", control = list(maxit=1000,fnscale = -1),
hessian = TRUE)
mod.gamma
# Save the log likelihood for later use below
llgamma <- mod.gamma$value
llgamma
# Calculate the variance matrix from the Hessian matrix.
v <- -solve(mod.gamma$hessian)
# Calculate the standard errors from the variance matrix.
se <- sqrt( diag(v))
# Calculate the z statistics from the coefficients and standard errors
b <- mod.gamma$par
zstat <-b/se
# Calculate p values from the z statistics
pz <- 2*pnorm( -abs(zstat))
# Put the results together in a table.
table <- cbind(b,se,zstat,pz)
table
# Now let's assume a Weibull Distribution. This distribution is true
when c=1 in the preceding
# Generalized Gamma Distribution. Again,plug this into the function for
the generalized gamma.
# Define the function to be optimized
llik.weibull <- function(par,X,Y) {
X <- as.matrix(x)
Y <- as.vector(y)
xbeta <- X%*%par[1:K]
c <- 1
k <- par[K1:K1]
cbx <- y/exp(xbeta)
cpk <- c+(1/k)
sum( log(k) - lgamma(c) +
c*k*(lgamma(cpk) - lgamma(c) )
- log(Y)
+c*k*log(cbx)
-(
exp(lgamma(cpk)-lgamma(c))*(cbx)
)^k )
}
llik.weibull
# Now let's use the above function to estimate the model.
parvalues <- c(mod.ggamma$par)
startvalues<- c(parvalues[1:3],1)
mod.weibull <- optim(c(startvalues),llik.weibull, method =
"BFGS", control = list(maxit=1000,fnscale = -1),
hessian = TRUE)
mod.weibull
# Save the log likelihood for later use below
llweibull <- mod.weibull$value
llweibull
# Calculate the variance matrix from the Hessian matrix.
v <- -solve(mod.weibull$hessian)
# Calculate the standard errors from the variance matrix.
se <- sqrt( diag(v))
# Calculate the z statistics from the coefficients and standard errors
b <- mod.weibull$par
zstat <-b/se
# Calculate p values from the z statistics
pz <- 2*pnorm( -abs(zstat))
# Put the results together in a table.
table <- cbind(b,se,zstat,pz)
table
# Now let's assume an Exponential Distribution. This distribution is
true when c=k=1
# in the preceding Generalized Gamma Distribution.
# Define the function to be optimized
llik.exponential <- function(par,X,Y) {
X <- as.matrix(x)
Y <- as.vector(y)
xbeta <- X%*%par[1:K]
c <- 1
k <- 1
cbx <- y/exp(xbeta)
cpk <- c+(1/k)
sum( log(k) - lgamma(c) +
c*k*(lgamma(cpk) - lgamma(c) )
- log(Y)
+c*k*log(cbx)
-(
exp(lgamma(cpk)-lgamma(c))*(cbx)
)^k )
}
llik.exponential
# Now let's use the above function to estimate the model.
parvalues <- c(mod.ggamma$par)
startvalues<- c(parvalues[1:3])
mod.exponential <- optim(c(startvalues),llik.exponential, method =
"BFGS", control = list(maxit=1000,fnscale = -1),
hessian = TRUE)
mod.exponential
# Save the log likelihood for later use below
llexponential <- mod.exponential$value
llexponential
# Calculate the variance matrix from the Hessian matrix.
v <- -solve(mod.exponential$hessian)
# Calculate the standard errors from the variance matrix.
se <- sqrt( diag(v))
# Calculate the z statistics from the coefficients and standard errors
b <- mod.exponential$par
zstat <-b/se
# Calculate p values from the z statistics
pz <- 2*pnorm( -abs(zstat))
# Put the results together in a table.
table <- cbind(b,se,zstat,pz)
table
# Model Discrimination: Choosing the correct distribution is more of an
# art than a science. Of course, as we noted above, it would be
# best if there were a basis in substantive theory for the chosen
model.
# Lacking this alternative, we can construct hypothesis tests based
# on Likelihood Ratio, Wald, or LaGrange Multiplier principles.
# For example, using a Likelihood Ratio test we could test each of the
# submodels against the Generalized Gamma model. The likelihood ratio
# statistic is chi squared with degrees of freedom equal to the number
# of parameter restrictions (1 for Generalized Gamma to Gamma and
Weibull;
# 2 for Generalized Gamma to Exponential)
# Likelihood Ratio, Generalized Gamma to Gamma
llggamma
llgamma
llweibull
lrggg <- 2*(llggamma-llgamma)
lrggg
lrgggpvalue <- 1-pchisq(lrggg, df=1)
lrgggpvalue
# Likelihood Ratio, Generalized Gamma to Weibull
lrggw <- 2*(llggamma-llweibull)
lrggw
lrggwpvalue <- 1-pchisq(lrggw, df=1)
lrggwpvalue
# Likelihood Ratio, Generalized Gamma to Exponential
lrgge <- 2*(llggamma-llexponential)
lrgge
lrggepvalue <- 1-pchisq(lrgge, df=2)
lrggepvalue
# We can also test the Exponential Model against the Gamma and Weibull
models.
# These likelihood ratio tests are chi squared with one degree of
freedom.
# Likelihood Ratio, Weibull to Exponential
lrwe <- 2*(llweibull-llexponential)
lrwe
lrwepvalue <- 1-pchisq(lrwe, df=1)
lrwepvalue
# Likelihood Ratio, Gamma to Exponential
lrge <- 2*(llgamma-llexponential)
lrge
lrgepvalue <- 1-pchisq(lrge, df=1)
lrgepvalue
# On this basis we choose the Weibull model, since it is not
statistically different
# than the Generalized Gamma model and more parsimonious (1 less
parameter). The Gamma model
# is different, which suggests the need for the additional parameter in
generalized gamma.
# Another distribution that is related to the Generalized Gamma is the
LogNormal.
# However, it is not obtained by a simple change in parameterization as
above. If a
# variable,y, is lognormal, then its variance is proportional to the
square of its mean.
# Here is the setup for estimating the lognormal distribution from the
preceding data.
llik.lognormal <- function(par,X,Y) {
Y <- as.vector(y)
X <- as.matrix(x)
xbeta <- X%*%par[1:K]
Sig <- par[K1:K1]
sum(-log(sqrt(2*pi)*Sig)-log(Y)-(1/(2*Sig^2))*(log(Y/exp(xbeta))+(Sig^2)/2)^2)
}
llik.lognormal
# Now let's use the above function to estimate the model. Get the start
values from the prior generalized gamma.
startvalues <- c(mod.ggamma$par[1:3],1)
mod.lognormal <- optim(c(startvalues),llik.lognormal, method =
"BFGS", control = list(maxit=100,fnscale = -1),
hessian = TRUE)
mod.lognormal
# Calculate the variance matrix from the Hessian matrix.
v <- -solve(mod.lognormal$hessian)
# Calculate the standard errors from the variance matrix.
se <- sqrt( diag(v))
# Calculate the z statistics from the coefficients and standard errors
b <- mod.lognormal$par
zstat <-b/se
# Calculate p-values for the z statistics
pzstat <- 2* (1 - pnorm(abs(zstat)))
# Put the results together in a table.
table <- cbind(b,se,zstat,pzstat)
table
# When the dependent data are in proportions (i.e.,0<y<1 ) then a
# Beta Distribution is often used. If the data are not in proportions
# then it is possible to put them into proportions simply by dividing
# the dependent variable by its maximum. Of course, if you transform
# the data just to use the Beta, then the scaling and interpretation of
# the coefficients will be different, so it is probably better to just
# use the Beta with variables that are naturally proportions unless you
have
# a good reason to want the Beta. The Beta Distribution is perhaps the
most
# flexible distribution. Look again at the Excel spreadsheet for the
Beta to
# get a sense of just how flexible it is. Of course, this flexibility
may be
# its major drawback since it is often difficult to justify on the
basis of
# substantive theory. (The above distributions may be also.) Here is
the Beta
# distribution applied to cross-national data on infant mortality per
1000 as
# a function of national income and whether the nation is oil
exporting.
Leinhardt <- read.table("c:/users/Wood/Documents/My
Teaching/Maximum Likelihood/Data/Leinhardt.txt", header=TRUE)
attach(Leinhardt)
summary(Leinhardt)
# First, create the rate of infant mortality from the data. This can be
thought of
# as the probability of one infant death per 1000 draws. This
probability naturally
# ranges from 0 to 1.
infantmort <- infant/1000
# Next, compute the odds ratio for infant mortality. The link function
is
# usually specified as an odds
infantodds <- exp(infantmort)/(1+exp(infantmort))
# Create a vector of ones for inclusion on the ml procedure
ones <- matrix(1:1,101,1)
# Now lets just do an ols to get start values and for comparison.
ols.model <- lm(infantodds ~ income + oilyes)
summary(ols.model)
library(e1071)
skewness(residuals(ols.model))
kurtosis(residuals(ols.model))
qqPlot(residuals(ols.model), dist= "norm", labels=FALSE)
hist(residuals(ols.model), probability=T, ylab='Density')
lines(density(residuals(ols.model)), lwd=2)
points(residuals(ols.model), rep(0, length(residuals(ols.model))),
pch="|")
box()
lines(density(residuals(ols.model), adjust=.5), lwd=1)
# Define the function to be optimized
llik.beta <- function(par,ONES,X1,X2,YP) {
ONES <- as.vector(ones)
X1 <- as.vector(income)
X2 <- as.vector(oilyes)
YP <- as.vector(infantmort)
a0 <- par[1:1]
a1 <- par[2:2]
a2 <- par[3:3]
xbeta <- a0*ONES + a1*X1 + a2*X2
p <- par[4:4]
mu <- exp(xbeta)/(1+exp(xbeta))
sum(lgamma(p)-lgamma(mu*p)-lgamma((1-mu)*p)+(mu*p-1)*log(YP)+((1-mu)*p-1)*log(1-YP))
}
llik.beta
# Now let's use the above function to estimate the model. Use
coefficients from the original
# regression as starting values, augmented by a 3. Why 3?
startvalues <- c(coefficients(ols.model),3)
startvalues
/* Now let's estimate the model */
mod.beta <- optim(c(startvalues),llik.beta, method =
"BFGS", control = list(maxit=1000,fnscale = -1),
hessian = TRUE)
mod.beta
# Calculate the variance matrix from the Hessian matrix.
v <- -solve(mod.beta$hessian)
# Calculate the standard errors from the variance matrix.
se <- sqrt( diag(v))
# Calculate the z statistics from the coefficients and standard errors
b <- mod.beta$par
zstat <-b/se
# Calculate p values from the z statistics
pz <- 2*pnorm( -abs(zstat))
# Put the results together in a table.
table <- cbind(b,se,zstat,pz)
table
# Note that obtaining a good set of starting values is crucial to
convergence or even
# initial iteration. B above in the vector of starting values is the
vector of coefficients
# from the prior regression. Ferrari and Cribari-Neto (2004) recommend
this as a good way
# to specify start values in beta regression. Note that the
coefficients from this regression
# are interpreted as the log of the odds ratio between the new level
and old level associated
# with the change in x.
#You might also want to try the user defined canned procedure called
betareg.
library(betareg)
betareg.out <- betareg(infantmort ~ income + oilyes, link =
"logit")
summary(betareg.out)