Poisson and
Negative Binomial Regression

The
purpose of this session is to show you how to use STATA's procedures for count
models including Poisson, Negative Binomial zero inflated Poisson, and zero inflated
Negative Binomial Regression. We also show how to do various tests for overdispersion and for discriminating between models.

/*This program estimates Poisson and Negative Binomial
Regression models using the McCullagh and Nelder data on ship accidents. The variables are:

Type
= Ship type

TA,
TB, TC, TD, TE = Ship Type indicators

T6064,
T6569, T7074, T7579 = Year constructed indicators

O6064,
O7579 = Years operated indicators

Months
= Measure of service amount

Acc
= Accidents. */

/*Now
let's read in the data */

use "C:\Documents and
Settings\B. Dan Wood\My Documents\My Teaching\Maximum
Likelihood\Data\shipaccidents.dta", clear

/*
First, let's do some descriptive statistics on the data just to confirm what is
obvious from looking at the data. */

summarize

summarize acc, detail

drop if acc==.

/*
Note that the data are non-normal. They are both skewed and kurtotic.
This implies the need for some model other than a normal distribution. Which
model? is the next question. We'll attempt to answer
this question later. */

/*
Let's begin by doing the estimation using the Poisson
model. */

gen logmth=log(months)

poisson acc tb tc td te
t6569 t7074 t7579 o7579 logmth

fitstat /*Obtain
various fit statistics on the poisson regression*/

listcoef, help /*List
the coefficients and standardized coefficients*/

prvalue, x(o7579=0)
rest(mean) /*Compute
probabilities when o7579=0 and rest at mean */

prchange, help /*Compute
various first differences */

mfx compute,
at(mean) /*Compute
marginal effects at the means */

poisgof /*Goodness
of fit for poisson model */

/*
McCullagh and Nelder assert
that the coefficient on Log(Months) is known to be 1.
As with all of the models we have estimated it is possible to restrict
coefficients. Do this for the Poisson model as
follows. We also save the predicted values and residuals for later testing.*/

constraint define 1 logmth=1

poisson acc tb tc td te
t6569 t7074 t7579 o7579 logmth, constraints (1)

mfx compute,
at(mean)

/*
Generate lambda and errors for later use */

predict lambda, ir

gen err=acc-lambda

/*
Now let's test the hypothesis that the period of manufacture has no effect on
accidents. First with a Wald statistic. We'll also
save the likelihood from this for later use. */

poisson acc tb tc td te
t6569 t7074 t7579 o7579 logmth,constraints(1)

lrtest, saving(0)

/*A
Wald test that the period of manufacture has no effect on accidents.*/

test t6569 t7074
t7579

/*
Now test the same hypothesis with a Likelihood Ratio test. */

poisson acc tb tc td te
o7579 logmth,constraints(1)

lrtest

/*
McCullagh and Nelder also
assert that the Poisson model exhibits overdispersion.
Let's test for overdispersion using a regression
based approach as suggested by Cameron and Trivedi*/

gen z=((acc-lambda)^2-acc)/(sqrt(2)*lambda)

regress z

regress z lambda, noconstant

/*
Since neither t-statistic is significant, there is no
evidence of overdispersion in the data using a
regression based approach.

We
can also test for overdispersion using the more general
approach suggested on pp. 743-44 of Greene.*/

gen i=1

gen ei2 = err*err

mkmat ei2

gen vi = ei2 - lambda

mkmat vi

gen vi2 = vi*vi

mkmat vi2

gen eivi =
err*vi

mkmat eivi

mkmat tb tc td te
t6569 t7074 t7579 o7579 logmth,matrix(zi)

mkmat i tb tc
td te t6569 t7074 t7579 o7579 logmth,matrix(x)

matrix mm = zi'*diag(vi2)*zi

matrix dd = x'*diag(ei2)*x

matrix md=zi'*diag(eivi)*x

matrix q = mm - md*inv(dd)*md'

matrix r = zi'*vi

matrix cm =
r'*inv(q)*r

matrix list cm

/*
CM is a chi square statistic with k degrees of freedom, where k is the number
of regressors. Here we reject the null that the
variance is unrelated to the the regressors
in a way that is not completely accounted for by the expected value of y. In
other words, we have evidence of heterogeneity due to the regressors.
We also reject the null that the mean equals the variance.

Now
let's test for overdispersion using a LaGrange
Multiplier test as given by Green on p. 744.This is a test for poisson versus negative binomial.*/

mkmat i

egen ybar=mean(acc)

mkmat ybar

matrix nybar=i'*ybar

mkmat err,
matrix(err)

mkmat lambda,matrix(lambda)

matrix lmtest=((err'*err-nybar)*(err'*err-nybar))*inv(2*lambda'*lambda)

matrix list lmtest

/*
The LaGrange Multiplier statistic is Chi Squared with 1 degree of freedom. The test is non-significant, so no evidence
of overdispersion. */

/*
We can also test for overdispersion
in the context of the Negative Binomial model. If the heterogeneity parameter
is significant then this is evidence for overdispersion.*/

nbreg acc tb tc td te
t6569 t7074 t7579 o7579 logmth,constraints(1)

/*The parameter alpha is not significant, so no evidence for overdispersion. We would not want to use the negative
binomial for these data.*/

/*
A way to deal with heterogeneity and the overdispersion
problem is to estimate a so-called zero inflated model. However, this should
only be used when there is reason to suspect multiple regimes in the data. We
can estimate a zero inflated Poisson model, obtain the Vuong
test, and obtain measures of fit, and effects. The default binary distribution
is the logit. Specify probit
option to get probit. */

zip acc tb
tc td te t6569 t7074 t7579
o7579 logmth, offset(logmth)
inflate(o7579) vuong

fitstat /*Obtain
various fit statistics on the zip regression*/

listcoef, help /*List
the coefficients and standardized coefficients*/

prvalue, x(o7579=0)
rest(mean) /*Compute
probabilities when o7579=0 and rest at mean */

prchange, help /*Compute
various first differences */

mfx compute,
at(mean) /*Compute
marginal effects at the means */

/*
We can also estimate a zero inflated Negative Binomial
model, obtain the Vuong test, and obtain measures of
fit, and effects. The default binary distribution is the logit.
Specify probit to get probit.
The zip option for the zinb procedure gives a test
for zip vs. zinb. */

zinb acc tb tc td te
t6569 t7074 t7579 o7579 logmth, offset(logmth) inflate(o7579) vuong zip

fitstat /*Obtain
various fit statistics on the zip regression*/

listcoef, help /*List
the coefficients and standardized coefficients*/

prvalue, x(o7579=0)
rest(mean) /*Compute
probabilities when o7579=0 and rest at mean */

prchange, help /*Compute
various first differences */

mfx compute,
at(mean) /*Compute
marginal effects at the means */

/*
Note that there is a problem with the zero inflated Negative Binomial model
probabilities, confirming what we found above concerning the inappropriateness
of the negative binomial. */