Multinomial Models for Discrete Outcomes
/* This file estimates models based on the multinomial
distribution. A common mistake is to estimate
naturally ordered data with MNL or MNP. First,
we estimate a multinomial logit (MNL)
for data on brand of a product chosen by
consumers. Brand 1 is coded 1, brand 2 is coded 2, and brand 3 is coded 3.*/
/*Using this file requires installation of dmlogit2 and the Hausman and Small-Hsiao IIA modules. It also uses SPOST which
you should have already installed. Run a
net search from within STATA on the following commands and install the proper
files:
dmlogit2
iia
smhsiao
*/
/*Let's load the data*/
clear
set more off
use "c:\users\wood\documents\My
Teaching\Maximum Likelihood\Data\brand.dta", clear
summarize
/*Multinomial Logit */
mlogit brand female age, base(1)
/* The output from the above has two parts, labeled with the
categories of the outcome variable brand. They correspond to
two equations:
log(P(brand=2)/P(brand=1)) = b_10 +
b_11*female + b_12*age
log(P(brand=3)/P(brand=1)) = b_20 +
b_21*female + b_22*age,
with the b's being the raw regression
coefficients from the output.
For example, we can say that for one unit change in the
variable age, the log of the ratio of the two probabilities,
P(brand=2)/P(brand=1), will be increased
by 0.368, and the log of the ratio of the two probabilities
P(brand=3)/P(brand=1)
will be increased by 0.686. Therefore, we
can say that, in general, the older a person is, the more he/she will prefer
brand 2 or 3 over brand 1. */
/*The option "base(1)"
sets brand 1 as the comparison group */
/* Note that MNL produces J-1 sets of
coefficient estimates for J choices. Typically, these are then used to compute
probabilities and marginal effects. */
/*Stata does not automatically
report the predicted probabilities.
However, we can retrieve them in the following manner*/
predict p1 p2 p3, p /*Calculate p(y=1) for each y */
predict z1, xb
outcome(1) /*Calculate z
for each y */
predict z2, xb
outcome(2) /*Calculate z
for each y */
predict z3, xb
outcome(3) /*Calculate z
for each y */
list brand z1 p1 z2 p2 z3 p3 /*List y, predicted probabilities, and z
*/
/* We can interpret the regression result graphically. For
example, we can use the three variables p1, p2 and p3 for the
predicted probabilities and plot them against a
predictor variable. In the example below, we plot p1 against age separated
by the variable female. Be patient. The
graph may take some time the first time you run it.*/
sort age
line p1 age if female ==0 || line p1 age if
female==1, legend(order(1 "male" 2 "female")) ytitle(P(Ford))
/*The ratio of the probability of choosing one outcome
category over the probability of choosing the reference category
is often referred as relative risk (and
it is also sometimes referred as odds).
So another way of interpreting the
regression results is in terms of relative risk.
We can say that for one unit change in the variable age,
we expect the relative risk of choosing
brand 2 over 1 to increase by exp(.3682) = 1.45. So we can say
that the relative risk is higher for older
people. For a dichotomous predictor variable such as female, we can say
that the ratio of the relative risks of
choosing brand 2 over 1 for female and male is exp(.5238) = 1.69.
We can use the rrr option for mlogit command to display the regression results in the
language of risk.*/
mlogit, rrr
/* Using margins with multinomial logit.
*/
mlogit brand i.female
age, base(1)
/* Predictions */
margins /* Probabilities */
margins female /* Probabilities for males and
females */
margins female, at(age=(24(1)38)) /*
Probabilities for males and females conditional on age */
marginsplot /* As men and women get older they are
less likely to buy a Ford and more likely to buy
a Toyota or Volkswagen. As men and women
get very old, they are likely to buy a Volkswagen */
/* Derivatives*/
margins, dydx(*) /*
average marginal effect*/
margins, dydx(*) atmeans /* marginal effect with all at means */
/*We can also obtain other helpful statistics with SPOST*/
fitstat /*Obtain
various fit statistics on the logit */
listcoef, help /*List
the coefficients and risk ratios/factor changes.*/
prtab female /*
Calculate the probabilities of females choosing a particular brand.*/
prvalue, x(female=0) rest(mean) /*Compute p(brand) when female=0, and rest
at mean. Should be same as above. */
prvalue, x(female=1) rest(mean) /*Compute p(brand) when female=1, and rest
at mean. Should be same as above. */
prchange, help /*Compute
various first differences */
dmlogit2 brand female age, base(1) /*The model in derivatives with all
variables at means*/
/* Reestimate the model for an
alternative procedure to calculate derivatives */
mlogit brand female age, base(1)
mfx compute, predict(outcome(2)) /*The model in derivatives switching
female from zero to one*/
mfx compute, predict(outcome(3))
/*Test for Independence of Irrelevant Alternatives in MNL
model with SPOST and mlogtest. By the way, this procedure does a variety
tests, including whether categories can be
combined or not. You can ask for the
tests individually, but the option “all” gives you
everything.
Use "search mlogtest" to get a
listing of the tests.*/
mlogit brand female age, base(1)
mlogtest, all
/* Note that omitting the category, Toyota, finds that this
model violates the IIA assumption for MNL using the Hausman
test,
but not for the Small-Hsiao test. What to
do? */
/*Alternatively, there is a user defined
procedure for doing the Small-Hsiao test.
*/
mlogit brand female age, base(1)
iia
/* Notice that this user defined
procedure breaks down. */
smhsiao brand female age, elim(2)
/* This user defined procedure again finds no violation when
omitting Toyota.*/
/*Now let's drop the predicted probabilities*/
drop z1 p1 z2 p2 z3 p3
/* STATA also offers two versions of multinomial probit. The first one below assumes no correlation between
errors
and homoskedasticity.
*/
mprobit brand female age, base(1)
/* STATA also offers an ability to parameterize the sigma
matrix using asmprobit. However, the data need to be
arranged as for conditional logit
below. */
/* MNL is inefficient when the dependent variable is
naturally ordered. Grade assignments is a classic
example of ranking data.
So, let's get some new data and estimate the relationship
with ordered probit. Returning to the Spector and Mazzeo data
with a new dependent variable, LETTERS.
LETTERS is coded 0=C, 1=B, 2=A. Then we reestimate
the earlier model as an
ordered probit to
compare the results.*/
use "c:\users\wood\documents\My
Teaching\Maximum Likelihood\Data\letters.dta", clear
summarize
/* Now estimate the ordered probit
*/
use "/users/bdanwood/documents/my
teaching/maximum likelihood//Data/affairs.dta",
clear
replace affairs = affairs>0
summ
oprobit ratingmarriage
age yrsmarried religiousness affairs
predict p1 p2 p3 p4 p5, p /*Calculate
p(y=1) for each y */
predict z, xb /*Calculate
z for each y */
list ratingmarriage z p1 p2 p3 p4 p5 /*List y, predicted
probabilities, and z */
fitstat /*Obtain various fit
statistics on the logit */
listcoef, help /*List the
coefficients and standardized coefficients*/
prvalue, x(affairs=0) rest(mean) /*Compute p(ratingmarriage|j)
when affairs=0, and rest at mean */
prvalue, x(affairs=1) rest(mean) /*Compute p(ratingmarriage|j)
when affairs=1, and rest at mean */
prchange /*Compute first differences for all
variables
mfx compute,
predict(outcome(1)) /*The model in
derivatives for each outcome*/
mfx compute,
predict(outcome(2))
mfx compute, predict(outcome(3))
mfx compute,
predict(outcome(4))
mfx compute,
predict(outcome(5))
/* Test
the Parallel Regressions Assumption */
omodel probit
ratingmarriage age yrsmarried
religiousness affairs
/* Test
the parallel regressions assumption of ordered probit.
*/
omodel probit
ratingmarriage age yrsmarried
religiousness affairs
/*Again,
let's drop the predicted probabilities*/
drop z p1 p2 p3 p4 p5
/* It is
also possible to obtain ordered logit estimates. These
are rarely seen in the literature. However, the following illustrates the
ordered logit procedure. */
ologit ratingmarriage
age yrsmarried religiousness affairs
/* Note that with ordered probit there
is only one set of coefficient estimates. This is more appealing, but CAUTION is still
required in interpreting the coefficients. Ordered probit
coefficients can have opposite signs from the
marginal effects. This is because
increasing X, while holding the coefficient and threshold estimates constant
actually shifts
the distribution to the right. This may decrease the probability
associated with particular outcomes. See Greene, pp. 736-40 for
an example.
*/
/*We can
also obtain predicted probabilities similar to the MNL above*/
predict p1 p2 p3 p4 p5, p /*Calculate
p(y=1) for each y */
predict z, xb /*Calculate
z for each y */
list ratingmarriage z p1 p2 p3 p4 p5 /*List y, predicted
probabilities, and z */
fitstat /*Obtain various fit statistics on
the logit */
listcoef, help /*List the coefficients and standardized coefficients*/
prvalue, x(affairs=0) rest(mean) /*Compute p(ratingmarriage|j)
when affairs=0, and rest at mean */
prvalue, x(affairs=1) rest(mean) /*Compute p(ratingmarriage|j)
when affairs=1, and rest at mean */
prchange /*Compute first differences for all
variables
mfx compute,
predict(outcome(1)) /*The model in
derivatives for each outcome*/
mfx compute,
predict(outcome(2))
mfx compute,
predict(outcome(3))
mfx compute,
predict(outcome(4))
mfx compute,
predict(outcome(5))
/* Test
the parallel regressions assumption of ordered probit.
*/
omodel logit
ratingmarriage age yrsmarried
religiousness affairs
brant, detail
/*Again,
let's drop the predicted probabilities*/
drop z p1 p2 p3 p4 p5
/*Now let's take a look at conditional logit. To do this, we will need to use a new data
set. Note the arrangement of the
data where there are j choices for each
individual reflected by a different line of data.*/
use "c:\users\wood\documents\My
Teaching\Maximum Likelihood\Data\clogit.dta", clear
set more off
/* This file illustrates an application of STATA's DISCRETE
CHOICE procedure, conditional logit. The model is one
of choice of
mode for transportation, for a sample of
individuals who travel between Sydney and Melbourne, Australia. The four
choices are
Air, Train, Bus, and Car.
Note the special set-up for the data for DISCRETE CHOICE
models. The data consist of observations on choices, rather than
on individuals. For each individual there
are j possible choices, so that there are j*n observations in the data set.
Basically, the
data setup is similar to that for panel
data. See the STATA help system or manual for more on setting up the data for
DISCRETE
CHOICE models.
Conditional Logit is commonly used
when the decisionmaker chooses primarily on the basis
of attributes of the choices, rather
than attributes of the individual. This
approach is illustrated in the program below.*/
/* Below we estimate a conditional logit
model for choice of mode of transport using choice specific constants and a
generalized
measure of perceived cost of the method of
transportation, household income, and terminal waiting time. Note that unlike
LIMDEP
we must specify the group ourselves, in
this case "group(id)"*/
clogit mode aasc tasc basc gc
ttme hinca, group(id)
predict prob, p /*Calculate p(y=1) for each
y */
predict zi, xb /*Calculate
z for each y */
list mode prob zi /*List
y, predicted probabilities, and z */
fitstat /*Obtain
various fit statistics on the logit */
listcoef, help /*List
the coefficients and standardized coefficients*/
/* With conditional logit as
opposed to MNL, we get a single set of coefficient estimates. This is an appealing
attribute of the model.
The preceding estimates say simply that people choose their
transportation mode based on perceived cost and a set of unmeasured
attributes associated with the choices. Note that
perceived cost is an attribute of the choice, rather than an attribute of the
individual.
However, it might be reasonable to assert that the decisionmaker's income is also relevant to the choice. The decisionmaker's income
does not vary across the alternatives. As a
result, the discrete choice probabilities are homogeneous of degree zero in the
parameters.
That bit of jargon means that if there are any attributes
that are the same for all outcomes for every individual, they drop out of the
probability model.
Thus, for example, any individual characteristic such as
age, or income, will cause this problem. The only way such variables can be
brought into the conditional logit model is by interacting them with the choice specific
constants. This is similar to analysis of
covariance in regression.*/
/*Test for IIA using conditional logit
setup */
clogit mode aasc tasc basc gc
ttme hinca, group(id)
est store all
clogit mode aasc tasc basc gc
ttme hinca if choice ~=1,
group(id)
est store partial
hausman partial all, alleqs
constant
/* If there is a violation of IIA, as shown here, then you
can use STATA's asmprobit procedure to do multinomial
probit. This
procedure allows you to take into account
different correlation structures and variances for the random utility
equations.
For example, the following sets all of the correlations
equal and scales two of the variances to one.*/
asmprobit mode gc ttme, casevars(hinc) case(id) alternatives(choice)
correlation(exchangeable)
/*Now let’s do a Nested Logit
which can avoid the problem of IIA, as well as allows modeling a “tree-like”
probability
model where choices are conditioned by prior
choices.*/
/* First, an example of Nested Logit
in which our primary interest is in estimating the determinants of the
branches. */
clear
webuse restaurant
nlogitgen type = restaurant(fast: Freebirds | MamasPizza,
family: CafeEccell
| LosNortenos | WingsNmore,
fancy: Christophers | MadCows)
nlogittree restaurant type, choice(chosen)
nlogit chosen cost distance rating || type: income
kids, base(family) || restaurant:, noconst case(family_id)
/* Now an example using the earlier
data where we are interested in both the branches and twigs. */
use "c:\users\wood\documents\My
Teaching\Maximum Likelihood\Data\clogit.dta", clear
set more off
sort id
by id: gen travel=_n-1
nlogitgen type=choice(fly:Air,
land: Car|Train|Bus)
nlogittree choice type, choice(mode)
nlogit mode gc ttme || type: hinc invt, base(1) || choice: psize ,
base(0) noconst case(id)
clear