Censoring and Truncation

The
purpose of this session is to show you how to use R's procedures for doing
censored and truncated regression. We also estimate Heckman's two-stage
procedure for samples with selection bias which is a form of incidential truncation.

# This file
demonstrates some of R's procedures for doing censored and truncated
regression.

# In
particular, we estimate a lower limit censored regression (i.e., Tobit),
Cragg's model

# that assumes
a heterogenous censoring process, Heckman's incidential truncation model for

# dealing with
sample selection bias, and truncated regression.

# We will use
some of the Mroz data on female labor force participation and income for these

# examples. The
first 428 observations of the Mroz data contained women who worked in 1975.

# The remaining
345 bservations contained women who did not work. We will use only the first

# 50
observations from each of these subsets of the data.

LFP <- read.table("Documents/My
Teaching/Maximum Likelihood/Data/tobit.txt",header=TRUE)

attach(LFP)

summary(LFP)

# Now let's
estimate a Tobit model and also save the log likelihood for later testing. The
dependent

# variable
(WHRS)is the wife's hours worked in 1975. The independent variables are a
constant,

# number of
children less than 6 years old (KL6), number of children between 6 and 18
(KL618), CIT is a dummy for whether live in a large city or not,

# wife's age
(WA), and wife's education (WE).

# Currently
Zelig does not support the tobit model. However, it is easily obtained using
the survival

# regression
package. We will discuss why in the next lecture period. Note that the element

# WHRS > 0
gives the non-censored observations. The "left" option implies left
censoring.

library(survival)

tobit.out <- survreg(Surv(WHRS, WHRS>0, type='left') ~ KL6 + K618 + WA + WE,

data=LFP, dist='gaussian')

summary(tobit.out)

# You can
obtain predicted values from the Tobit regression as follows.

WHRShat <- predict(tobit.out,type="response")

WHRShat

# We will also
save the log likelihood for the model for later use.

lltobit <- tobit.out$loglik[2:2]

lltobit

# However, this
prediction would assume that we are interested in predictions on y*. However,
y* is latent and

partially determined by a probit. Thus, following Greene,# p. 764 the expected value must
take into account the

# probability
of censoring. First, get the standard deviation of the latent residuals.

SD <- tobit.out$scale

SD

# Now calculate
the inverse Mills ratio called lambda

lambda <- dnorm(WHRShat/SD)/pnorm(WHRShat/SD)

lambda

# Now calculate
the expected value given censoring and X

EYgivenX <- pnorm(WHRShat/SD)*(WHRShat + SD*lambda)

EYgivenX

# McDonald and
Moffit suggest a useful decomposition of the marginal effects associated with
the

# censored
regression model. They show that a change in the conditional mean due to right
side

# variables
derives from two sources:

# 1) It affects
the conditional mean in the uncensored part of the distribution

# 2) It affects
the conditional mean by also affecting the probability that an observation will

# lie in the
uncensored part of the distribution.

xbar <- as.matrix(colMeans(cbind(1,LFP[3:6])))

beta <-coefficients(tobit.out)

BXoverS <- t(beta) %*% xbar/SD

Mu <- dnorm(BXoverS)/pnorm(BXoverS)

P <- pnorm(BXoverS)

P

P1 <- P*(1-BXoverS*Mu-Mu^2)

P1

P2 <- dnorm(BXoverS)*BXoverS+dnorm(BXoverS)*Mu

P2

# We can also
calculate marginal effects on the expected value of the unconditional
distribution,

# the expected
value of the uncensored distribution, as well as probabilities of being in the

# uncensored
part of the distribution as follows.

N <- nrow(LFP)

censored <- sum(WHRS<=0)

Fz <- (N-censored)/N

z <- qnorm(Fz)

fz <- 1/sqrt(2*pi)*exp(-.5*z^2)

cond <- (1-(z*fz/Fz))-fz^2/Fz^2

uncond <- Fz

prob <- fz/SD

rawcoeff <- beta[2:5]

mfxuncond <- rawcoeff*uncond

mfxuncensored <- rawcoeff*cond

puncensored <- rawcoeff*prob

table <- cbind(rawcoeff,mfxuncond,mfxuncensored,puncensored)

table

# The censored
regression model assumes that the uncensored part of the distribution is normal

# If it is not,
then estimates are inconsistent.

# We can test
this assumption. Drop the censored observations and test for normality

LFP2 <- subset(LFP, subset=WHRS>0)

summary(LFP2)

library(e1071)

skewness(LFP2$WHRS)

kurtosis(LFP2$WHRS)

library(car)

qqPlot(LFP2$WHRS, dist= "norm", labels=FALSE)

hist(LFP2$WHRS, probability=T, ylab='Density')

lines(density(LFP2$WHRS), lwd=2)

points(LFP2$WHRS, rep(0,
length(residuals(ols.model))), pch="|")

box()

lines(density(LFP2$WHRS, adjust=.5), lwd=1)

# N is small
here, so we cannot be fully confident of these tests. However, there is some

#evidence of
non-normality.

# Cragg has
suggested that assuming a censoring limit that depends on the same distribution
as

# the
uncensored observations is often incorrect. He suggests a two equation system
in which the

# first
equation estimates the probability of being above the censoring limit and the
second is a

# truncated
regression on the uncensored observations. Below we estimate Cragg's model
using

# Probit and a
Truncated regression procedure. Also, we do a likelihood ratio test of whether

# Cragg's model
is significantly different than the Tobit model.

library(Zelig)

probit.out <- zelig(LFP ~ KL6 + K618 + WA + WE, model = "probit", data = LFP)

summary(probit.out)

probitsummary.out <- summary(probit.out)

llprobit <- -.5 * probitsummary.out$deviance

llprobit

# We will write
our own procedure for the truncated normal regression on the uncensored
observations

# First, create
a new dataset using the uncensored observations.

LFPsubset <- subset(LFP, subset=LFP>0)

x <- cbind(1,LFPsubset[3:6])

y <- as.vector(LFPsubset$WHRS)

K <- ncol(x)

K1 <- K+1

n <- nrow(x)

# Assign the
lower truncation limit

L <- 0

# Define the
function to be optimized

llik.ltruncnormal <- function(par,X,Y) {

Y <- as.vector(y)

X <- as.matrix(x)

xbeta <- X%*%par[1:K]

Sig <- par[K1:K1]

sum( log((1/Sig)*dnorm((Y-xbeta)/Sig))-log(1-pnorm((L-xbeta)/Sig)))

}

llik.ltruncnormal

# Now let's use
the above function to estimate the model.

# Get start
values from a regular regression.

regress.out <- lm(WHRS ~ KL6 + K618 + WA + WE, data = LFPsubset)

startvalues <- c(coefficients(regress.out),120)

# Estimate the
lower truncated regression.

ltrunc.out <- optim(startvalues,llik.ltruncnormal, method = "BFGS", control = list(trace=1,maxit=100,fnscale = -1),

hessian = TRUE)

ltrunc.out

# Calculate
standard errors, z statistics, and pvalues and report in a table.

v <- -solve(ltrunc.out$hessian)

se <- sqrt( diag(v))

b <- ltrunc.out$par

zstat <-b/se

pzstat <- 2* (1 - pnorm(abs(zstat)))

table <- cbind(b,se,zstat,pzstat)

table

# Get the log
likelihood from the truncated regression.

lltruncate <- ltrunc.out$value

lltruncate

# Calculate
Cragg's test statistic

lrtest <- 2*((llprobit+lltruncate)-lltobit)

lrtest

# The
restricted model is Tobit. The unrestricted model is the two models estimated

# separately.
The test statistic is 14.34, which is chi-squared with 5 degrees of freedom

# for the
number of additional parameters being estimated. The critical value is 11.07,
so

# we reject the
null that the restricted model is true. The two equation approach is therefore

# more
appropriate than Tobit.

# Now let's
turn to estimating a model with sample selection bias. In these cases the

# truncation is
incidental, due to sample selection on another variable that is correlated

# with the
truncation in the dependent variable. As discussed in class, the standard model
is

# Heckman's two
stage procedure. Once again, there is no "canned" procedure in R to
do

# the Heckman
model. The easy thing to do in this case would be to use STATA or LIMDEP.

# However, for
illustrative purposes we develop our own procedure in R.

# First,
estimate the first stage probit.

heckprob.out <- zelig(LFP ~ CIT + KL6, model="probit", data=LFP)

summary(heckprob.out)

summaryheckprob.out <- summary(heckprob.out)

# Now retrieve
the index function from the probit

gamma <- as.matrix(summaryheckprob.out$coefficients)

w <- cbind(1,CIT,KL6)

z <- w %*% gamma

# Now calculate
lambda and delta

LFP$lambda <-
dnorm(z)/pnorm(z)

LFP$delta <- lambda *(lambda+z)

detach(LFP)

attach(LFP)

# Now get the
subset of the data that suffers from selection bias

LFPheckman <- subset(LFP, subset=LFP>0)

# Now run the
second stage regression that includes lambda (the Mills ratio) from the probit.

heckman.out <- lm( WHRS ~ KL6 + K618 + WA + WE + lambda, data=LFPheckman)

summary(heckman.out)

# The
coefficients from this OLS regression are now unbiased due to the inclusion of

# the source of
the bias in the equation. However, the standard errors and inferential

# statistics
will be incorrect due to not taking into account the sampling variability

# contained in
lambda. Also, the residuals are heteroskedastic by definition.

# We need to
compute a corrected covariance matrix of estimates for both problems.

# First, get
the standard error of the regression corrected for selection.

blambda <- heckman.out$coefficients[6:6]

deltabar <- mean(LFPheckman$delta)

sigma <- sqrt(((t(heckman.out$residuals) %*% heckman.out$residuals)/nrow(LFPheckman))+ deltabar*blambda^2)

# Next obtain
rho in the correction rho*sigma*lambda*wgamma, Greene p. 784

rho2 <- blambda^2/sigma^2

rho <- sqrt(rho2)

# Next obtain
an n x n Identity matrix, the Delta matrix, and I-rho^2 Delta.

# These are
from Greene, p 785

n <- nrow(LFPheckman)

I <- matrix(0,nrow=n,ncol=n)

I[row(I)==col(I)] <- 1

Delta <- as.vector(LFPheckman$delta[,1]) * I

IminusRho2Delta <- (I - as.numeric(rho2)*Delta)

# Now calculate
the correction factor Q

xstar <- cbind(1,LFPheckman[,2],LFPheckman[,3],LFPheckman[,4],LFPheckman[,5],LFPheckman[,6])

w <- cbind(1,LFPheckman$CIT,LFPheckman$KL6)

Q <- (t(xstar) %*% Delta %*% w) %*% vcov(heckprob.out) %*% (t(w) %*% Delta %*% xstar)

# Now, obtain
the corrected asymptotic covariance matrix of coefficients.

vcovheckman <- as.numeric(sigma^2)
*(solve(t(xstar) %*% xstar) %*% (t(xstar) %*% IminusRho2Delta %*% xstar + Q) %*% solve(t(xstar) %*% xstar))

# Now produce
corrected standard errors, z statistics and pvalues along with the coefficients

se <- sqrt( diag(vcovheckman))

b <- heckman.out$coefficients[1:6]

zstat <-b/se

pzstat <- 2* (1 - pnorm(abs(zstat)))

table <- cbind(b,se,zstat,pzstat)

table

# Again, this
section was illustrative only. You will probably want to use STATA or LIMDEP

# for most
Heckman estimations. However, the preceding demonstrates how to

# program your
own estimator. If you have an econometrics text with the matrices,

# then R can be
used to implement the estimator. As a final note, the preceding

# would have
been incorrect if taken directly from the textbook. Greene's formulation

# for delta i
at 1 on p. 784 is incorrect. So it pays to have the original

# reference and
to check the math against multiple references.