Introduction to
Parametric Duration Models
The
purpose of this session is to show you how to use STATA's procedures for
estimating parametric duration models. Note that we do not cover non-parametric
or semi-parametric duration models which are an important part of this
literature.
/* This file demonstrates the STATA procedures for evaluating duration data. First, let's read in the data. These
are data on the duration of labor strikes taken from Greene, p. 800. The variable T is the duration of the strike.
The variable PROD is the residual from a linear regression of the log of industrial production on time, time
squared, and a set of monthly dummy variables. */
use "C:\users\wood\documents\my teaching\maximum likelihood\data\strikes.dta", clear
/* In contrast to Limdep, STATA does not expect the dependent variable to be logged. You will want a censoring
variable to indicate whether or not a process is still ongoing. Below we create the censoring variable, STATUS,
for all durations above 80. Note that the censoring variable can be omitted from the specifications below if
there is no censoring. */
gen status=t<81
/*We need to stset the data set to alert STATA that we have duration data and want to estimate survival models*/
stset t, failure(status)
/* Let's first summarize the data */
summarize
/* Now let's just estimate some duration models with no covariates. This would be useful for exploring the
duration dependence of a process. First the Exponential model. Note that we are estimating this model in
the Accelerated Failure Time form using the option "time" */
streg , dist(exp) time
/*Now lets get the median duration of strikes predicted by this distribution.*/
predict medsurvexp, med
display medsurvexp
/* Let's now get the survival, hazard, and cumulative hazard functions for the exponential function.*/
stcurve, surv
more
stcurve , haz
more
stcurve, cumhaz
/* Let's now calculate AIC for later use in model testing and comparison. */
scalar aicexp = -2*e(ll)+2*(e(df_m)+1)
display aicexp
/*Note the distinctive hazard and integrated hazard functions for the Exponential model. This reflects the
memory-less property of this distribution. Also, note that this distribution results in hazard functions that
are strictly monotonic. */
/*Now let's estimate the same relationship with the Weibull model. Weibull is probably the most popular.
The reason is it's flexibility and the fact that it encompasses the exponential. You have a built-in test for
exponential, in the significance and size of p. Also, the Weibull's hazard function is more flexible in that
it posits exponential change in the hazard function, which is an advantage if you think that duration
dependence is not linear as in the exponential function. Note that both the exponential and weibull
disributions posit only monotonic change in the hazard function. */
streg , dist(weibull) time
/*Now lets get the median duration of strikes predicted by this distribution.*/
predict medsurvweib, med
display medsurvweib
/* Let's now get the survival, hazard, and cumulative hazard functions for the weibull model.*/
stcurve, surv
more
stcurve , haz
more
stcurve, cumhaz
/* Now let's calculate AIC for later use in model testing and comparison. */
scalar aicweib = -2*e(ll)+2*(e(df_m)+1)
display aicweib
/*STATA will also estimate the Gamma model. Note that the GAMMA function allows non-monotonicity
in the hazard function.*/
streg , dist(gamma) time
/*Now lets get the median duration of strikes predicted by this distribution.*/
predict medsurvgam, med
display medsurvgam
/* Let's now get the survival, hazard, and cumulative hazard functions for the gamma model.*/
stcurve, surv
more
stcurve , haz
more
stcurve, cumhaz
/* Now let's calculate AIC for later use in model testing and comparison. */
scalar aicgamma = -2*e(ll)+2*(e(df_m)+1)
display aicgamma
/*Generally, note the distinctively different shape of the hazard functions for the Gamma model versus
the Weibull or exponential. This demonstrates that it does make a difference which model is selected.
If you think the baseline hazard should be non-monotonic, then you need to specify one of the functions
that allow non-monotonicity. STATA will also estimate a lognormal and loglogistic model, both of which
also allow a non-monotonic hazard function. */
streg , dist(lognormal) time
/*Now lets get the median duration of strikes predicted by this distribution.*/
predict medsurvlnorm, med
display medsurvlnorm
stcurve, surv
more
stcurve , haz
more
stcurve, cumhaz
more
/* Calculate AIC for the lognormal model. */
scalar aiclognorm = -2*e(ll)+2*(e(df_m)+1)
display aiclognorm
/* Calculate AIC for the loglogistic model. */
streg , dist(loglogistic) time
/*Now lets get the median duration of strikes predicted by this distribution.*/
predict medsurvloglog, med
display medsurvloglog
stcurve, surv
more
stcurve , haz
more
stcurve, cumhaz
more
scalar aicloglog = -2*e(ll)+2*(e(df_m)+1)
display aicloglog
/*We can select the appropriate model using AIC. We want the model with minimum AIC */
display aicexp
display aicweib
display aicgamma
display aiclognorm
display aicloglog
/*Based on this, we find that the gamma model provides the best fit for modeling duration dependence.
However, we might want to make this determination after including covariates. Note also that this differs from
the finding from LIMDEP using the F distribution encompassing test that the Weibull is the correct model. */
/*Another diagnostic for model specification utilizes the cumulative hazard function. A well specified model
will have a cumulative hazard function that begins at zero and increases in linear fashion. For example,
compare the cumulative hazard functions for the gamma versus the weibull models below.*/
streg , dist(gamma) time
stcurve, cumhaz
more
streg , dist(weibull) time
stcurve, cumhaz
/*This diagnostic suggests that the Weibull may be the best model. */
/*STATA allows you to include covariates in the duration model. First, let's estimate the Cox model, which makes
no assumptions about the form of duration dependence, but does assume proportional hazards. We obtain estimates
in both the coefficient form and the hazard ratio form.*/
stcox prod, nohr
stcox prod
/* Note that the hazard ratio form reveals a very high number. Why? The hazard ratio calculation computes the change in
hazard associated with a one unit change in the variable. However, the variable ranges from a minimum of -0.104 to a
maximum of 0.074. A one unit change is not consistent with the data. We could, however, transform the variable prod
to standard deviation units, which would provide a more interesting quantity.*/
egen prodstd = std(prod)
stcox prodstd
/* A one standard deviation change in lost production produces a 52 percent increase in the probability of the strike
ending in the very next time period.*/
/*We can test the proportional hazards assumption by interacting the right side variables with time and then testing
the significance of the interaction coefficients. STATA simplifies this as shown below. The tvc option specifies that
you want an interaction, and the texp option specifies the form of the interaction which in this case is just time. */
stcox prod, tvc(prod) texp(_t) nohr
test [tvc]prod
/*We can also test the proportional hazards assumption using the Schoenfeld residuals. STATA has a built in command
to perform this test, as well as plot the scaled residuals as a function of time. */
stcox prod, schoenfeld(sch*) scaledsch(sca*)
estat phtest
stphtest,detail
stphtest,plot(prod)
/*There is no evidence that we have violated the proportional hazard assumption from the preceding tests. However, we still
might be interested in the baseline hazard function for theoretical reasons, so a parametric model might be appropriate.
Now let's estimate some of the parametric survival models with a covariate. As observed in class, for these models,
the coefficients on covariates must be interpreted with some caution. The interpetation
is relatively straightforward with an exponential, or some values of the P parameter for other distribution.
Otherwise, exercise caution. */
/*If you are using one of the accelerated failure time distributions (e.g.,, exponential or weibull), then you can
get the estimates in either the coefficient, hazard ratio, or time ratio metric. Again, we have to think about the
scaling of the prod variable in interpreting the hazard or time ratios. */
/* First, the two models in coefficient form.*/
streg prodstd, dist(weibull) nohr
streg prodstd, dist(weibull) time
/* Now the Weibull in hazard ratio form.*/
streg prodstd, dist(weibull)
/* A one standard deviation increase in lost production produces an increase of 68.5 percent of the hazard of a
strike ending in the next period.*/
/* Now in accelerated failure time and time ratio form. */
streg prodstd, dist(weibull) tr
/* A one standard deviation increase in lost production produces strikes that are 47 percent shorter*/
/*If you are using the gamma, log-logistic, or gamma models, you can have the estimates reported in either the
coefficient metric or the time ratio metric, but not in a hazard ratio metric. */
streg prod, dist(gamma) time
streg prod, dist(gamma) tr
/*The preceding models are all log-linear. However, an alternative model, the Gompertz, is not log-linear.
Note that with the Gompertz there is no intercept. Below we report estimates in both hazard ratio and
coefficients form.*/
streg prod, dist(gompertz)
streg prod, dist(gompertz) nohr
/*STATA will also estimate models with heterogeneity. For example, here is a model that assumes
heterogeneity in the coefficients based on whether the data is censored. Be patient. This one takes a
while to converge.*/
set more off
streg prod, dist(weibull) strata(status)
/*Here is a model that assumes an unknown source of heterogeneity in the durations, modeled as a
gamma distribution. These are the so-called frailty models. */
streg prod, dist(weibull) frailty(gamma)
/*This lesson has provided a summary of parametric models of survival. As you can see, there is considerable
expertise required to use and interpret these models. A future course in survival analysis might be required to learn
everything about this class of models.*/