Multinomial
Models for Discrete Outcomes
The
purpose of this session is to show you how to use R's procedures for doing
Multinomial Logit (MNL). Additionally, we look at
Ordered Logit and Probit. Note
that both STATA and R also have “canned” procedures for conditional and nested logit. STATA and R also have “canned” procedures for
multinomial Probit. There is an R package (MNP) which
will estimate the multinomial Probit
model in a Bayesian framework using Markov Chain Monte Carlo methods. There is
also an MNP procedure in the mlogit package.
#
This file uses R to compare procedures for multinomial logit
and ordered probit and logit
for
#
data that are naturally ordered. A common mistake is
to estimate naturally ordered data with MNL
#
or MNP.
#First,
we estimate a multinomial logit (MNL) for the Spector
and Mazzeo data with a new
#
dependent variable, LETTERS. LETTERS is coded 0=C,
1=B, 2=A. Then we reestimate the model as an
#
ordered logit to compare the
results.
#
Call some needed libraries
library(car)
library(stats)
library(foreign)
library(nnet)
#
First, let's load the data.
Brand
<- read.dta("/users/bdanwood/Documents/My
Teaching/Maximum Likelihood/Data/brand.dta")
attach(Brand)
summary(Brand)
#
We will first estimate a multinomial logit using multinom from the nnet library.
#
First, put the data in the individual multinomial logit
data framework setting Ford as the reference category.
Brand$brand <- relevel(Brand$brand,
ref = "Ford")
#
Now estimate the model.
mlogit.out <- multinom(brand ~ female + age, data=Brand)
summary(mlogit.out)
# Note that the coefficients are reported for each of the
two equations, and then the standard errors.
#
Following are some interesting values that can be extracted from the model for
later use.
#
The value of the log-likelihood at the maximum can be used in a likelihood
ratio test for
#
testin the model
specification.
loglik <-
-1/2*deviance(mlogit.out)
loglik
#
Model Coefficients
beta <-coefficients(mlogit.out)
beta
#
Fitted Probabilities
fitted.probs <- fitted.values(mlogit.out)
fitted.probs
#
Let's get the fitted probabilities the hard way now to demonstrate how it is
done.
#
First create the necessary matrices.
x <- cbind(1,female,age)
betamat<-
matrix(coefficients(mlogit.out),3,2, byrow=T)
betamat
beta0 <- betamat[,1]
beta1 <- betamat[,2]
beta0
beta1
#
Now calculate the probabilities
p2 <- 1/(1+exp(x %*% beta0)+exp(x
%*% beta1))
p2
p1 <- exp(x %*% beta1)*p2
p1
p0 <- exp(x %*% beta0)*p2
p0
#
We can also calculate the probabilities with all variables at their means the
hard way.
#
First, calculate the means.
xbar <- colMeans(cbind(1,Brand[2:3]))
xbar
#
Now calculate the probabilities at the means.
p2bar <- 1/(1+exp(t(xbar) %*%
beta0)+exp(t(xbar) %*%
beta1))
p2bar
p1bar <- exp(t(xbar)
%*% beta1)*p2bar
p1bar
p0bar <- exp(t(xbar)
%*% beta0)*p2bar
p0bar
#
Note that the preceding can be used to calculate first differences simply by
replacing
#
elements on xbar with
interesting units. For example, we might consider moving FEMALE to MALE
#
to see the effect on the three probabilities.
xbar[2]=0
p2bar <- 1/(1+exp(t(xbar) %*%
beta0)+exp(t(xbar) %*%
beta1))
p2bar
p1bar <- exp(t(xbar)
%*% beta1)*p2bar
p1bar
p0bar <- exp(t(xbar)
%*% beta0)*p2bar
p0bar
xbar[2]=1
p2bar <- 1/(1+exp(t(xbar) %*%
beta0)+exp(t(xbar) %*%
beta1))
p2bar
p1bar <- exp(t(xbar)
%*% beta1)*p2bar
p1bar
p0bar <- exp(t(xbar)
%*% beta0)*p2bar
p0bar
#
We can also allow Zelig to calculate first
differences.
library(Zelig)
library(ZeligChoice)
#
Load the data again
Brand
<- read.dta("/users/bdanwood/Documents/My
Teaching/Maximum Likelihood/Data/brand.dta")
attach(Brand)
summary(Brand)
mlogit.out <- zelig(as.factor(brand) ~ female +
age, model = "mlogit",
data = Brand)
summary(mlogit.out)
#Simulate
the predicted probabilities of choosing a brand, female=0, age at 30
x.out <- setx(mlogit.out, age=30, female =
0)
s.out <- sim(mlogit.out, x = x.out, num=10000)
summary(s.out)
#Simulate
the predicted probability of choosing a brand, female=1, age at 30
x.out <- setx(mlogit.out, age=30,
female=1)
s.out <- sim(mlogit.out, x = x.out)
summary(s.out)
#
We could just take the difference between these to get a first difference.
However, Zelig
#
can do this in one step.
#
First Difference when shifting female from 0 to 1 and age at 30
female.low <- setx(mlogit.out, age=30,
female=0)
female.high <- setx(mlogit.out, age = 30,
female=1)
s.out <- sim(mlogit.out, x = female.low, x1 = female.high)
summary(s.out)
plot(s.out)
#
There is a "canned" library that will estimate a Multinomial Probit in R using the Bayesian MCMC approach.
#
Let's use that library to estimate an individual-specific data choice model.
# Note that the covariance restrictions are fixed with this
model.
The variance for for Ford is fixed to 1,
#
and the covariances Ford to
Toyota and Volkswagen are fixed to 0.
library(MNP)
mnp.out <- mnp(brand ~ female + age, data = brand,
verbose =
TRUE)
##
summarize the results
summary(mnp.out)
#
There is also a library that will do regular, conditional logit,
random parameter logit, multinomial probit,
#
and nested logit.
# Note that the clogit data used
in the STATA assignment are available in the package AER. We will use
those
#
data again.
library(mlogit)
rm(list=ls(all=TRUE))
data("TravelMode", package =
"AER")
##
a pure "conditional" model with only choice
specific variables
travel.mode <-mlogit(choice~wait+gcost, TravelMode, shape = "long",
chid.var
= "individual", alt.var="mode",
choice = "choice")
summary(travel.mode)
##
a "mixed" model with both choice specific
and individual specific
travel.mode <-mlogit(choice~wait+gcost|income+size,
TravelMode, shape = "long",
chid.var
= "individual", alt.var="mode",
choice = "choice")
summary(travel.mode)
##
a "mixed" model with both choice specific
and individual specific using Multinomial Probit
travel.mode <-mlogit(choice~wait+gcost|income+size,
TravelMode, shape = "long",
chid.var
= "individual", alt.var="mode",
choice = "choice", probit=TRUE)
summary(travel.mode)
####
A nested logit model
TravelMode$avincome <- with(TravelMode, income * (mode == "air"))
TravelMode$time <- with(TravelMode, travel + wait)/60
TravelMode$timeair <- with(TravelMode, time * I(mode == "air"))
TravelMode$income <- with(TravelMode, income / 10)
#
Hensher and Greene (2002), table 1 p.8-9 model 5
TravelMode$incomeother <- with(TravelMode, ifelse(mode %in%
c('air', 'car'), income, 0))
nl <- mlogit(choice~gcost+wait+incomeother,
TravelMode,
shape='long',
alt.var='mode',
nests=list(public=c('train',
'bus'), other=c('car','air')))
summary(nl)
#
test for iia
data("TravelMode",package="AER")
TravelMode <- mlogit.data(TravelMode,choice="choice",shape="long",
alt.var="mode",chid.var="individual")
##
Create a variable of income only for the air mode
TravelMode$avinc <- with(TravelMode,(mode=='air')*income)
##
Estimate the model on all alternatives, with car as the base level
##
like in Greene's book.
#x <- mlogit(choice~wait+gcost+avinc,TravelMode,reflevel="car")
x <- mlogit(choice~wait+gcost+avinc,TravelMode)
##
Estimate the same model for ground modes only (the variable avinc
##
must be dropped because it is 0 for every observation
g <- mlogit(choice~wait+gcost,TravelMode,reflevel="car",
alt.subset=c("car","bus","train"))
##
Compute the test
hmftest(x,g)
#
The preceding models are all inefficient when the dependent variable is
naturally ordered.
#
So, let's get some new data and estimate a relationship with ordered probit.
#
We will use the data on extra-marital affairs and marital satisfaction
collected by Ray Fair
library(car)
library(foreign)
library(Zelig)
library(ZeligChoice)
rm(list=ls(all=TRUE))
Affairs
<- read.dta("/users/bdanwood/Documents/My
Teaching/Maximum Likelihood/Data/affairs.dta")
attach(Affairs)
summary(Affairs)
oprobit.out <- zelig(as.factor(ratingmarriage) ~ age + yrsmarried
+ religiousness + affairs, model =
"oprobit", data = Affairs)
summary(oprobit.out)
#
Simulate ordered probit first differences using
Affairs=0 to Affairs=1
x.low <- setx(oprobit.out, affairs=0)
x.high <- setx(oprobit.out, affairs= 1)
s.out <- sim(oprobit.out, x = x.low, x1 = x.high)
summary(s.out)
plot(s.out)
ologit.out <- zelig(as.factor(ratingmarriage) ~ age + yrsmarried
+ religiousness + affairs, model =
"oprobit", data = Affairs)
summary(ologit.out)
#
Simulate ordered probit first differences using
Affairs=0 to Affairs=1
x.low <- setx(oprobit.out, affairs=0)
x.high <- setx(oprobit.out, affairs= 1)
s.out <- sim(oprobit.out, x = x.low, x1 = x.high)
summary(s.out)
plot(s.out)