Multinomial Models for Discrete Outcomes

/* This file estimates models based on the multinomial distribution. A common mistake is to estimate

naturally ordered data with MNL or MNP. First, we estimate a multinomial logit (MNL)

for data on brand of a product chosen by consumers. Brand 1 is coded 1, brand 2 is coded 2, and brand 3 is coded 3.*/

/*Using this file requires installation of dmlogit2 and the Hausman and Small-Hsiao IIA modules.  It also uses SPOST which you should have already installed.  Run a net search from within STATA on the following commands and install the proper files:

dmlogit2

iia

smhsiao

*/

clear

set more off

use "c:\users\wood\documents\My Teaching\Maximum Likelihood\Data\brand.dta", clear

summarize

/*Multinomial Logit */

mlogit brand female age, base(1)

/* The output from the above has two parts, labeled with the categories of the outcome variable brand. They correspond to

two equations:

log(P(brand=2)/P(brand=1)) = b_10 + b_11*female + b_12*age

log(P(brand=3)/P(brand=1)) = b_20 + b_21*female + b_22*age,

with the b's being the raw regression coefficients from the output.

For example, we can say that for one unit change in the variable age, the log of the ratio of the two probabilities,

P(brand=2)/P(brand=1), will be increased by 0.368, and the log of the ratio of the two probabilities P(brand=3)/P(brand=1)

will be increased by 0.686. Therefore, we can say that, in general, the older a person is, the more he/she will prefer

brand 2 or 3 over brand 1. */

/*The option "base(1)" sets brand 1 as the comparison group */

/* Note that MNL produces J-1 sets of coefficient estimates for J choices. Typically, these are then used to compute

probabilities and marginal effects. */

/*Stata does not automatically report the predicted probabilities.  However, we can retrieve them in the following manner*/

predict p1 p2 p3, p                             /*Calculate p(y=1) for each y */

predict z1, xb outcome(1)                 /*Calculate z for each y */

predict z2, xb outcome(2)                 /*Calculate z for each y */

predict z3, xb outcome(3)                 /*Calculate z for each y */

list brand z1 p1 z2 p2 z3 p3        /*List y, predicted probabilities, and z */

/* We can interpret the regression result graphically. For example, we can use the three variables p1, p2 and p3 for the

predicted probabilities and plot them against a predictor variable. In the example below, we plot p1 against age separated

by the variable female. Be patient. The graph may take some time the first time you run it.*/

sort age

line p1 age if female ==0 || line p1 age if female==1, legend(order(1 "male" 2 "female")) ytitle(P(Ford))

/*The ratio of the probability of choosing one outcome category over the probability of choosing the reference category

is often referred as relative risk (and it is also sometimes referred as odds).  So another way of interpreting the

regression results is in terms of relative risk. We can say that for one unit change in the variable age,

we expect the relative risk of choosing brand 2 over 1 to increase by exp(.3682) = 1.45. So we can say

that the relative risk is higher for older people. For a dichotomous predictor variable such as female, we can say

that the ratio of the relative risks of choosing brand 2 over 1 for female and male is exp(.5238) = 1.69.

We can use the rrr option for mlogit command to display the regression results in the language of risk.*/

mlogit, rrr

/* Using margins with multinomial logit. */

mlogit brand i.female age, base(1)

/* Predictions */

margins /* Probabilities */

margins female /* Probabilities for males and females */

margins female, at(age=(24(1)38)) /* Probabilities for males and females conditional on age */

marginsplot /* As men and women get older they are less likely to buy a Ford and more likely to buy

a Toyota or Volkswagen. As men and women get very old, they are likely to buy a Volkswagen */

/* Derivatives*/

margins, dydx(*) /* average marginal effect*/

margins, dydx(*) atmeans /* marginal effect with all at means */

/*We can also obtain other helpful statistics with SPOST*/

fitstat                                         /*Obtain various fit statistics on the logit */

listcoef, help                                  /*List the coefficients and risk ratios/factor changes.*/

prtab female                                    /* Calculate the probabilities of females choosing a particular brand.*/

prvalue, x(female=0) rest(mean)     /*Compute p(brand) when female=0, and rest at mean. Should be same as above. */

prvalue, x(female=1) rest(mean)     /*Compute p(brand) when female=1, and rest at mean. Should be same as above. */

prchange, help                                  /*Compute various first differences */

dmlogit2 brand female age, base(1)        /*The model in derivatives with all variables at means*/

/* Reestimate the model for an alternative procedure to calculate derivatives */

mlogit brand female age, base(1)

mfx compute, predict(outcome(2))          /*The model in derivatives switching female from zero to one*/

mfx compute, predict(outcome(3))

/*Test for Independence of Irrelevant Alternatives in MNL model with SPOST and mlogtest.   By the way, this procedure does a variety

tests, including whether categories can be combined or not.  You can ask for the tests individually, but the option “all” gives you

everything.  Use "search mlogtest" to get a listing of the tests.*/

mlogit brand female age, base(1)

mlogtest, all

/* Note that omitting the category, Toyota, finds that this model violates the IIA assumption for MNL using the Hausman test,

but not for the Small-Hsiao test. What to do? */

/*Alternatively, there is a user defined procedure for doing the Small-Hsiao test.  */

mlogit brand female age, base(1)

iia

/* Notice that this user defined procedure breaks down. */

smhsiao brand female age, elim(2)

/* This user defined procedure again finds no violation when omitting Toyota.*/

/*Now let's drop the predicted probabilities*/

drop z1 p1 z2 p2 z3 p3

/* STATA also offers two versions of multinomial probit. The first one below assumes no correlation between errors

and homoskedasticity. */

mprobit brand female age, base(1)

/* STATA also offers an ability to parameterize the sigma matrix using asmprobit. However, the data need to be

arranged as for conditional logit below. */

/* MNL is inefficient when the dependent variable is naturally ordered. Grade assignments is a classic example of ranking data.

So, let's get some new data and estimate the relationship with ordered probit. Returning to the Spector and Mazzeo data

with a new dependent variable, LETTERS. LETTERS is coded 0=C, 1=B, 2=A. Then we reestimate the earlier model as an

ordered probit to compare the results.*/

use "c:\users\wood\documents\My Teaching\Maximum Likelihood\Data\letters.dta", clear

summarize

/* Now estimate the ordered probit */

use "/users/bdanwood/documents/my teaching/maximum likelihood//Data/affairs.dta", clear

replace affairs = affairs>0

summ

oprobit ratingmarriage age yrsmarried religiousness affairs

predict p1 p2 p3 p4 p5, p                /*Calculate p(y=1) for each y */

predict z, xb                            /*Calculate z for each y */

list ratingmarriage z p1 p2 p3 p4 p5               /*List y, predicted probabilities, and z */

fitstat                                       /*Obtain various fit statistics on the logit */

listcoef, help                           /*List the coefficients and standardized coefficients*/

prvalue, x(affairs=0) rest(mean)    /*Compute p(ratingmarriage|j) when affairs=0, and rest at mean */

prvalue, x(affairs=1) rest(mean)    /*Compute p(ratingmarriage|j) when affairs=1, and rest at mean */

prchange                                 /*Compute first differences for all variables

mfx compute, predict(outcome(1))    /*The model in derivatives for each outcome*/

mfx compute, predict(outcome(2))

mfx compute, predict(outcome(3))

mfx compute, predict(outcome(4))

mfx compute, predict(outcome(5))

/* Test the Parallel Regressions Assumption */

omodel probit ratingmarriage age yrsmarried religiousness affairs

/* Test the parallel regressions assumption of ordered probit. */

omodel probit ratingmarriage age yrsmarried religiousness affairs

/*Again, let's drop the predicted probabilities*/

drop z p1 p2 p3 p4 p5

/* It is also possible to obtain ordered logit estimates. These are rarely seen in the literature. However, the following illustrates the

ordered logit procedure. */

ologit ratingmarriage age yrsmarried religiousness affairs

/* Note that with ordered probit there is only one set of coefficient estimates. This is more appealing, but CAUTION is still

required in interpreting the coefficients. Ordered probit coefficients can have opposite signs from the

marginal effects.  This is because increasing X, while holding the coefficient and threshold estimates constant actually shifts

the distribution to the right. This may decrease the probability associated with particular outcomes. See Greene, pp. 736-40 for

an example. */

/*We can also obtain predicted probabilities similar to the MNL above*/

predict p1 p2 p3 p4 p5, p                /*Calculate p(y=1) for each y */

predict z, xb                            /*Calculate z for each y */

list ratingmarriage z p1 p2 p3 p4 p5               /*List y, predicted probabilities, and z */

fitstat                                  /*Obtain various fit statistics on the logit */

listcoef, help                           /*List the coefficients and standardized coefficients*/

prvalue, x(affairs=0) rest(mean)    /*Compute p(ratingmarriage|j) when affairs=0, and rest at mean */

prvalue, x(affairs=1) rest(mean)    /*Compute p(ratingmarriage|j) when affairs=1, and rest at mean */

prchange                                 /*Compute first differences for all variables

mfx compute, predict(outcome(1))    /*The model in derivatives for each outcome*/

mfx compute, predict(outcome(2))

mfx compute, predict(outcome(3))

mfx compute, predict(outcome(4))

mfx compute, predict(outcome(5))

/* Test the parallel regressions assumption of ordered probit. */

omodel logit ratingmarriage age yrsmarried religiousness affairs

brant, detail

/*Again, let's drop the predicted probabilities*/

drop z p1 p2 p3 p4 p5

/*Now let's take a look at conditional logit.  To do this, we will need to use a new data set. Note the arrangement of the

data where there are j choices for each individual reflected by a different line of data.*/

use "c:\users\wood\documents\My Teaching\Maximum Likelihood\Data\clogit.dta", clear

set more off

/* This file illustrates an application of STATA's DISCRETE CHOICE procedure, conditional logit. The model is one of choice of

mode for transportation, for a sample of individuals who travel between Sydney and Melbourne, Australia. The four choices are

Air, Train, Bus, and Car.

Note the special set-up for the data for DISCRETE CHOICE models. The data consist of observations on choices, rather than

on individuals. For each individual there are j possible choices, so that there are j*n observations in the data set. Basically, the

data setup is similar to that for panel data. See the STATA help system or manual for more on setting up the data for DISCRETE

CHOICE models.

Conditional Logit is commonly used when the decisionmaker chooses primarily on the basis of attributes of the choices, rather

than attributes of the individual. This approach is illustrated in the program below.*/

/* Below we estimate a conditional logit model for choice of mode of transport using choice specific constants and a generalized

measure of perceived cost of the method of transportation, household income, and terminal waiting time. Note that unlike LIMDEP

we must specify the group ourselves, in this case "group(id)"*/

clogit mode aasc tasc basc gc ttme hinca, group(id)

predict prob, p                     /*Calculate p(y=1) for each y */

predict zi, xb                            /*Calculate z for each y */

list mode prob zi                   /*List y, predicted probabilities, and z */

fitstat                                         /*Obtain various fit statistics on the logit */

listcoef, help                            /*List the coefficients and standardized coefficients*/

/* With conditional logit as opposed to MNL, we get a single set of coefficient estimates. This is an appealing attribute of the model.

The preceding estimates say simply that people choose their transportation mode based on perceived cost and a set of unmeasured

attributes associated with the choices. Note that perceived cost is an attribute of the choice, rather than an attribute of the individual.

However, it might be reasonable to assert that the decisionmaker's income is also relevant to the choice. The decisionmaker's income

does not vary across the alternatives. As a result, the discrete choice probabilities are homogeneous of degree zero in the parameters.

That bit of jargon means that if there are any attributes that are the same for all outcomes for every individual, they drop out of the

probability model.

Thus, for example, any individual characteristic such as age, or income, will cause this problem. The only way such variables can be

brought into the conditional logit model is by interacting them with the choice specific constants. This is similar to analysis of

covariance in regression.*/

/*Test for IIA using conditional logit setup */

clogit mode aasc tasc basc gc ttme hinca, group(id)

est store all

clogit mode aasc tasc basc gc ttme hinca if choice ~=1, group(id)

est store partial

hausman partial all, alleqs constant

/* If there is a violation of IIA, as shown here, then you can use STATA's asmprobit procedure to do multinomial probit. This

procedure allows you to take into account different correlation structures and variances for the random utility equations.

For example, the following sets all of the correlations equal and scales two of the variances to one.*/

asmprobit mode gc ttme, casevars(hinc) case(id) alternatives(choice) correlation(exchangeable)

/*Now let’s do a Nested Logit which can avoid the problem of IIA, as well as allows modeling a “tree-like” probability

model where choices are conditioned by prior choices.*/

/* First, an example of Nested Logit in which our primary interest is in estimating the determinants of the branches. */

clear

webuse restaurant

nlogitgen type = restaurant(fast: Freebirds | MamasPizza, family:  CafeEccell | LosNortenos | WingsNmore, fancy: Christophers | MadCows)

nlogittree restaurant type, choice(chosen)

nlogit chosen cost distance rating || type: income kids, base(family) || restaurant:, noconst case(family_id)

/* Now an example using the earlier data where we are interested in both the branches and twigs. */

use "c:\users\wood\documents\My Teaching\Maximum Likelihood\Data\clogit.dta", clear

set more off

sort id

by id: gen travel=_n-1

nlogitgen type=choice(fly:Air, land: Car|Train|Bus)

nlogittree choice type, choice(mode)

nlogit mode gc ttme || type: hinc invt, base(1) || choice: psize , base(0) noconst case(id)

clear