Poisson and
Negative Binomial Regression
The
purpose of this session is to show you how to use STATA's procedures for count
models including Poisson, Negative Binomial zero inflated Poisson, and zero inflated
Negative Binomial Regression. We also show how to do various tests for overdispersion and for discriminating between models.
/*This program estimates Poisson and Negative Binomial
Regression models using the McCullagh and Nelder data on ship accidents. The variables are:
Type
= Ship type
TA,
TB, TC, TD, TE = Ship Type indicators
T6064,
T6569, T7074, T7579 = Year constructed indicators
O6064,
O7579 = Years operated indicators
Months
= Measure of service amount
Acc
= Accidents. */
/*Now
let's read in the data */
use "C:\Documents and
Settings\B. Dan Wood\My Documents\My Teaching\Maximum
Likelihood\Data\shipaccidents.dta", clear
/*
First, let's do some descriptive statistics on the data just to confirm what is
obvious from looking at the data. */
summarize
summarize acc, detail
drop if acc==.
/*
Note that the data are non-normal. They are both skewed and kurtotic.
This implies the need for some model other than a normal distribution. Which
model? is the next question. We'll attempt to answer
this question later. */
/*
Let's begin by doing the estimation using the Poisson
model. */
gen logmth=log(months)
poisson acc tb tc td te
t6569 t7074 t7579 o7579 logmth
fitstat /*Obtain
various fit statistics on the poisson regression*/
listcoef, help /*List
the coefficients and standardized coefficients*/
prvalue, x(o7579=0)
rest(mean) /*Compute
probabilities when o7579=0 and rest at mean */
prchange, help /*Compute
various first differences */
mfx compute,
at(mean) /*Compute
marginal effects at the means */
poisgof /*Goodness
of fit for poisson model */
/*
McCullagh and Nelder assert
that the coefficient on Log(Months) is known to be 1.
As with all of the models we have estimated it is possible to restrict
coefficients. Do this for the Poisson model as
follows. We also save the predicted values and residuals for later testing.*/
constraint define 1 logmth=1
poisson acc tb tc td te
t6569 t7074 t7579 o7579 logmth, constraints (1)
mfx compute,
at(mean)
/*
Generate lambda and errors for later use */
predict lambda, ir
gen err=acc-lambda
/*
Now let's test the hypothesis that the period of manufacture has no effect on
accidents. First with a Wald statistic. We'll also
save the likelihood from this for later use. */
poisson acc tb tc td te
t6569 t7074 t7579 o7579 logmth,constraints(1)
lrtest, saving(0)
/*A
Wald test that the period of manufacture has no effect on accidents.*/
test t6569 t7074
t7579
/*
Now test the same hypothesis with a Likelihood Ratio test. */
poisson acc tb tc td te
o7579 logmth,constraints(1)
lrtest
/*
McCullagh and Nelder also
assert that the Poisson model exhibits overdispersion.
Let's test for overdispersion using a regression
based approach as suggested by Cameron and Trivedi*/
gen z=((acc-lambda)^2-acc)/(sqrt(2)*lambda)
regress z
regress z lambda, noconstant
/*
Since neither t-statistic is significant, there is no
evidence of overdispersion in the data using a
regression based approach.
We
can also test for overdispersion using the more general
approach suggested on pp. 743-44 of Greene.*/
gen i=1
gen ei2 = err*err
mkmat ei2
gen vi = ei2 - lambda
mkmat vi
gen vi2 = vi*vi
mkmat vi2
gen eivi =
err*vi
mkmat eivi
mkmat tb tc td te
t6569 t7074 t7579 o7579 logmth,matrix(zi)
mkmat i tb tc
td te t6569 t7074 t7579 o7579 logmth,matrix(x)
matrix mm = zi'*diag(vi2)*zi
matrix dd = x'*diag(ei2)*x
matrix md=zi'*diag(eivi)*x
matrix q = mm - md*inv(dd)*md'
matrix r = zi'*vi
matrix cm =
r'*inv(q)*r
matrix list cm
/*
CM is a chi square statistic with k degrees of freedom, where k is the number
of regressors. Here we reject the null that the
variance is unrelated to the the regressors
in a way that is not completely accounted for by the expected value of y. In
other words, we have evidence of heterogeneity due to the regressors.
We also reject the null that the mean equals the variance.
Now
let's test for overdispersion using a LaGrange
Multiplier test as given by Green on p. 744.This is a test for poisson versus negative binomial.*/
mkmat i
egen ybar=mean(acc)
mkmat ybar
matrix nybar=i'*ybar
mkmat err,
matrix(err)
mkmat lambda,matrix(lambda)
matrix lmtest=((err'*err-nybar)*(err'*err-nybar))*inv(2*lambda'*lambda)
matrix list lmtest
/*
The LaGrange Multiplier statistic is Chi Squared with 1 degree of freedom. The test is non-significant, so no evidence
of overdispersion. */
/*
We can also test for overdispersion
in the context of the Negative Binomial model. If the heterogeneity parameter
is significant then this is evidence for overdispersion.*/
nbreg acc tb tc td te
t6569 t7074 t7579 o7579 logmth,constraints(1)
/*The parameter alpha is not significant, so no evidence for overdispersion. We would not want to use the negative
binomial for these data.*/
/*
A way to deal with heterogeneity and the overdispersion
problem is to estimate a so-called zero inflated model. However, this should
only be used when there is reason to suspect multiple regimes in the data. We
can estimate a zero inflated Poisson model, obtain the Vuong
test, and obtain measures of fit, and effects. The default binary distribution
is the logit. Specify probit
option to get probit. */
zip acc tb
tc td te t6569 t7074 t7579
o7579 logmth, offset(logmth)
inflate(o7579) vuong
fitstat /*Obtain
various fit statistics on the zip regression*/
listcoef, help /*List
the coefficients and standardized coefficients*/
prvalue, x(o7579=0)
rest(mean) /*Compute
probabilities when o7579=0 and rest at mean */
prchange, help /*Compute
various first differences */
mfx compute,
at(mean) /*Compute
marginal effects at the means */
/*
We can also estimate a zero inflated Negative Binomial
model, obtain the Vuong test, and obtain measures of
fit, and effects. The default binary distribution is the logit.
Specify probit to get probit.
The zip option for the zinb procedure gives a test
for zip vs. zinb. */
zinb acc tb tc td te
t6569 t7074 t7579 o7579 logmth, offset(logmth) inflate(o7579) vuong zip
fitstat /*Obtain
various fit statistics on the zip regression*/
listcoef, help /*List
the coefficients and standardized coefficients*/
prvalue, x(o7579=0)
rest(mean) /*Compute
probabilities when o7579=0 and rest at mean */
prchange, help /*Compute
various first differences */
mfx compute,
at(mean) /*Compute
marginal effects at the means */
/*
Note that there is a problem with the zero inflated Negative Binomial model
probabilities, confirming what we found above concerning the inappropriateness
of the negative binomial. */