Introduction to Parametric Duration Models
The purpose of this session is to show
you how to use some of R's procedures for estimating parametric duration
models. Note that we do not cover non-parametric or semi-parametric duration
models which are an important part of this literature. Our purpose is simply to
introduce
another application of maximum
likelihood.
#
The purpose of this session is to show you how to use R's procedures for
estimating
#
parametric duration models. Note that we do not cover non-parametric or
semi-parametric
#
duration models which are an important part of this literature.
#
This file demonstrates the R procedures for evaluating duration data. First,
let's
#
read in the data. These are data on the duration of labor strikes taken from
Greene,
#
p. 800. The variable T is the duration of the strike. The variable PROD is the
residual
#
from a linear regression of the log of industrial production on time, time
squared,
#
and a set of monthly dummy variables. It measures aggregate industrial
production less
#
trend and seasonal components.
#
The purpose of this session is to show you how to use R's procedures for
estimating
#
parametric duration models. Note that we do not cover non-parametric or
semi-parametric
#
duration models which are an important part of this literature.
#
This file demonstrates the R procedures for evaluating duration data. First,
let's
#
read in the data. These are data on the duration of labor strikes taken from
Greene,
#
p. 800. The variable T is the duration of the strike. The variable PROD is the
residual
#
from a linear regression of the log of industrial production on time, time
squared,
#
and a set of monthly dummy variables. It measures aggregate industrial
production less
#
trend and seasonal components.
library(moments)
library(car)
strikes
<- read.table("/users/bdanwood/dropbox/public/My Teaching/Maximum
Likelihood/Data/strikes.txt",
header=TRUE)
attach(strikes)
summary(strikes)
#
In contrast to Limdep, R does not expect the dependent variable to be logged.
You will
#
also need a censoring variable to indicate whether or not a process is still
ongoing. Below
#
we create the censoring variable, STATUS, for all durations below 80. Note that
unlike
#
STATA and LIMDEP, R expects the censoring variable to indicate observations
that are NOT
#
censored.
strikes$status
<- strikes$T < 80
detach(strikes)
attach(strikes)
#
Now let's just estimate some duration models with no covariates. This would be
useful for
#
exploring the duration dependence of a process. First the Exponential model.
library(rms)
#
Estimate the exponential model
exp.model
<- psm(Surv(T, status) ~ 1, strikes, dist="exponential")
exp.model
#
Calculate AIC for later use in comparing models
ll.exp
<- exp.model$loglik[2]
df.exp
<- nrow(strikes)-exp.model$df.residual-1
aic.exp
<- -2*ll.exp+2*df.exp+1
aic.exp
#
Plot the exponential Survival curve. Note that the hazard function is always a
straight line
#
for this model and trivial to plot.
S
<- Survival(exp.model)
times
<- seq(0,max(T),by=1)
plot(times,
S(times,exp.model$coef),
xlab="Days",ylab="S(t)",main="Plot of Exponential
Survival function",type='l') #
plots survival curve at X*Beta hat=?
#
Estimate the Weibull model
weibull.model
<- psm(Surv(T, status) ~ 1, strikes, dist="weibull")
weibull.model
#
Calculate AIC for later use in comparing models
ll.weibull
<- weibull.model$loglik[2]
df.weibull
<- nrow(strikes)-weibull.model$df.residual-1
aic.weibull
<- -2*ll.weibull+2*df.weibull+1
aic.weibull
#
Plot the survival and hazard functions for the Weibull model
S
<- Survival(weibull.model)
times
<- seq(0,max(T),by=1)
par(mfrow=c(2,1))
plot(times,
S(times,weibull.model$coef),
xlab="Days",ylab="S(t)",main="Plot of Weibull Survival
function",type='l') # plots
survival curve at X*Beta hat=?
H
<- Hazard(weibull.model)
plot(times,
H(times,weibull.model$coef),
xlab="Days",ylab="H(t)",main="Plot of Weibull Hazard
function",type='l') # plots
survival curve at X*Beta hat=?
#
Estimate the lognormal model
lognormal.model
<- psm(Surv(T, status) ~ 1, strikes, dist="lognormal")
lognormal.model
#
Calculate AIC for later use in comparing models
ll.lognormal
<- lognormal.model$loglik[2]
df.lognormal
<- nrow(strikes)-lognormal.model$df.residual-1
aic.lognormal
<- -2*ll.lognormal+2*df.lognormal+1
aic.lognormal
#
Plot the survival and hazard functions for the lognormal model
S
<- Survival(lognormal.model)
times
<- seq(0,max(T),by=1)
par(mfrow=c(2,1))
plot(times,
S(times,lognormal.model$coef),
xlab="Days",ylab="S(t)",main="Plot of lognormal
Survival function",type='l') #
plots survival curve at X*Beta hat=?
H
<- Hazard(lognormal.model)
plot(times,
H(times,lognormal.model$coef),
xlab="Days",ylab="H(t)",main="Plot of lognormal Hazard
function",type='l') # plots
survival curve at X*Beta hat=?
#
Estimate the logistic model
logistic.model
<- psm(Surv(T, status) ~ 1, strikes, dist="logistic")
logistic.model
#
Calculate AIC for later use in comparing models
ll.logistic
<- logistic.model$loglik[2]
df.logistic
<- nrow(strikes)-logistic.model$df.residual-1
aic.logistic
<- -2*ll.logistic+2*df.logistic+1
aic.logistic
#
Plot the survival and hazard functions for the logistic model
S
<- Survival(logistic.model)
times
<- seq(0,max(T),by=1)
par(mfrow=c(2,1))
plot(times,
S(times,logistic.model$coef), xlab="Days",ylab="S(t)",main="Plot
of logistic Survival function",type='l')
# plots survival curve at X*Beta hat=?
H
<- Hazard(logistic.model)
plot(times,
H(times,logistic.model$coef),
xlab="Days",ylab="H(t)",main="Plot of logistic Hazard
function",type='l') # plots
survival curve at X*Beta hat=?
#
Estimate the loglogistic model
loglogistic.model
<- psm(Surv(T, status) ~ 1, strikes, dist="loglogistic")
loglogistic.model
#
Calculate AIC for later use in comparing models
ll.loglogistic
<- loglogistic.model$loglik[2]
df.loglogistic
<- nrow(strikes)-loglogistic.model$df.residual-1
aic.loglogistic
<- -2*ll.loglogistic+2*df.loglogistic+1
aic.loglogistic
#
Plot the survival and hazard functions for the loglogistic model
S
<- Survival(loglogistic.model)
times
<- seq(0,max(T),by=1)
par(mfrow=c(2,1))
plot(times,
S(times,loglogistic.model$coef),
xlab="Days",ylab="S(t)",main="Plot of loglogistic
Survival function",type='l') #
plots survival curve at X*Beta hat=?
H
<- Hazard(loglogistic.model)
plot(times,
H(times,loglogistic.model$coef),
xlab="Days",ylab="H(t)",main="Plot of loglogistic
Hazard function",type='l') # plots
survival curve at X*Beta hat=?
#
We can use the AIC statistics as a model selection tool. The model with the
lowest AIC is best.
AIC.compare
<- rbind(aic.exp,aic.weibull,aic.lognormal,aic.logistic,aic.loglogistic)
AIC.compare
#
Based on this comparison the lognormal model is the best model.
#
We can also include covariates in each of these models. Reestimate the Weibull
model with PROD
weibull.model
<- psm(Surv(T, status) ~ 1 + PROD, strikes, dist="weibull")
weibull.model
#
Plot the survival and hazard functions for the Weibull model at xbeta mean
xbar
<- c(1,mean(PROD))
xbarbeta
<- xbar %*% weibull.model$coefficients
S
<- Survival(weibull.model)
times
<- seq(0,max(T),by=1)
par(mfrow=c(2,1))
plot(times,
S(times,xbarbeta), xlab="Days",ylab="S(t)",main="Plot
of Weibull Survival function",type='l')
# plots survival curve at X*Beta hat=?
H
<- Hazard(weibull.model)
plot(times,
H(times,xbarbeta), xlab="Days",ylab="H(t)",main="Plot
of Weibull Hazard function",type='l')
# plots survival curve at X*Beta hat=?
#
Another diagnostic for model specification utilizes the cumulative hazard
function. A
#
well specified model will have a cumulative hazard function that begins at zero
and
#
increases in linear fashion. For example, here we compare the cumulative hazard
#
functions for the weibull versus lognormal models
#
Reestimate the Weibull model above
weibull.model
<- psm(Surv(T, status) ~ 1, strikes, dist="weibull")
weibull.model
#
Reestimate the Lognormal model above
lognormal.model
<- psm(Surv(T, status) ~ 1, strikes, dist="lognormal")
lognormal.model
#
Plot the integrated hazard function for the Weibull model
S
<- Survival(weibull.model)
times
<- seq(0,max(T),by=1)
par(mfrow=c(2,1))
plot(times,
-log(S(times,weibull.model$coef)), xlab="Days",ylab="Integrated
Hazard Function",main="Plot of Weibull Integrated Hazard
Function",type='l') # plots survival
curve at X*Beta hat=?
#
Plot the integrated hazard function for the lognormal model
S
<- Survival(lognormal.model)
times
<- seq(0,max(T),by=1)
plot(times,
-log(S(times,lognormal.model$coef)),
xlab="Days",ylab="Integrated Hazard
Function",main="Plot of lognormal Integrated Hazard
Function",type='l') # plots
survival curve at X*Beta hat=?
#
Based on this comparison, the Weibull model is the best since it comes closest
to
#
forming a straight line.
#
This lesson has provided a summary of parametric models of survival in R. As
you can see,
#
there is considerable expertise required to use and interpret these models. A
future
#
course in survival analysis might be required to learn everything about this
class of
#
models.