PROBABILITY DISTRIBUTIONS

AND

ESTIMATING A MEAN AND VARIANCE USING MLE

The purpose of this session is to familiarize you with some
of the more important probability distributions and to take an initial step in
understanding how probability distributions relate to likelihood and log
likelihood functions.

PROBABILITY DISTRIBUTIONS: Recall that a probability
distribution is a mapping from a random variable (call it Y) to the probability
of having observed that random variable (P[Y]). Maximum Likelihood Estimation
(MLE) involves specifying a probability distribution that describes the social
science experiment that generated the random variable under investigation. Of
course, the data generation process may have been consistent with more than one
probability distribution since it is possible to derive some distributions from
others.

The first part of this assignment is to go to Microsoft
Excel and call up various spreadsheets that plot some of the more interesting univariate probability distributions. You should have the
following spreadsheets for this exercise.

Normal Distribution

Chi-Square Distribution

Bernoulli Distribution

Binomial Distribution

Negative Binomial Distribution

Poisson Distribution

Weibull Distribution

Exponential Distribution

Gamma Distribution

LogNormal Distribution

Beta Distribution

Note that there are two spreadsheets for the Gamma and Beta distributions to
illustrate the different ways in which these may be parameterized. In general,
there are many possible parameterizations of most probability distributions.

For each of these spreadsheets, the cells marked green are
areas where you may change the parameters and observe the behavior of the
resulting distributions. (Note: You may need to resize or reposition the graphs
because of the different settings and different versions of Excel that are
around.) In any case, play with each of these spreadsheets, noting the
essential features of each probability distribution, including the range of the
random variable, the range of the parameters, and the effect of the parameters
on means, modes, skew, kurtosis, and shape. The Evans, Hastings, and Peacock
text contains many more details on each of these distributions, so you might use
this reference in combination with the spreadsheets to learn more about the
probability functions. As an exercise, you might also try on your own to graph
one or more of the other probability distributions in the Evans, Hastings, and
Peacock book.

The probability distributions in these spreadsheets are univariate
distributions, meaning that there is a single random variable in the domain of
the probability function. However, social scientists are also sometimes
interested in the joint probability associated with multiple random variables.
For example, we sometimes assume a bivariate normal
distribution when there are two dependent variables, both normal, to be modeled
simultaneously. Another example, the multinomial distribution is a discrete
joint distribution with dimensions equal to the number of categories in the
multinomial variable. Excel doesn't do so well in plotting multivariate
distributions, particularly when they are continuous. However, Maple can do the
job. Use Maple to look at the bivariate normal
distribution contained in the file called Bivariate
Normal.mws. Change r (the correlation between the two random variables), s_{1},
s_{2}, m_{1}, and m_{2} to observe the effect on the
distribution.

ESTIMATING A MEAN AND VARIANCE OF A DISTRIBUTION USING MLE:
Maximum likelihood is purely and simply an estimation technique. In practice,
we specify a probability distribution that could have generated the data, put
that probability distribution into a likelihood and log-likelihood function,
and then estimate the parameters of that distribution using MLE. One approach
to implementing MLE would be through trial and error. As an example of how this
might work, go to the Excel spreadsheet entitled MLNormal
Mu.xls. In this spreadsheet I have entered the data on page 9 of Eliason in the second column. In the first column I have a
vector of initial guesses for the mean of the distribution. In the third column
is the log-likelihood associated with each initial guess, assuming independent
draws from a normal distribution. (Note: We could have used any of the
distributions above in computing the log-likelihoods). The MLE estimate of the
mean is just the guess that produces the largest number in column 3. The graph
plots the values of the log-likelihood function in column 3 against the vector
of guesses in column 1. We can also look at the graph to find the maximum. If
we want to increase the precision of estimation, then we can change the vector
of guesses to range from say 1.81 through 1.9. Do this to get a sense of what
happens to the log-likelihoods and graph.

Of course, trial and error methods
are very inefficient and may also be quite cumbersome when there are multiple
parameters. Thus, a better way to do estimation is using the methods of
calculus or iterative techniques. In class we show how to optimize a function
using both analytical and numerical methods. Computers are very useful in
implementing the latter. Below is a short R program for finding the mean and
variance of a normally distributed variable using MLE.

# This file calculates a mean using maximum
likelihood.

#
Read in the Ostrom data

Ostrom <- read.table("C:/Data/ostrom.dat",
header=TRUE)

attach(Ostrom)

#list
the data

Ostrom

#Compute
summary statistics

summary(Ostrom)

#
Put the data into matrices for use in the MLE
procedure

x
<- cbind(1,as.matrix(USSR))

y
<- as.matrix(US)

ones <- x[,1]

#
Calculate n for later use

n
<- sum(ones)

n

#
Define the function to be optimized

llik.mean <- function(par,X,Y) {

Y
<- as.vector(y)

X
<- as.matrix(ones)

mu <- X%*%par[1:1]

Sig
<- par[2:2]

sum(-(1/2)*log(2*pi)-(1/2)*log(Sig^2)-(1/(2*Sig^2))*(Y-mu)^2)

}

llik.mean

#
Optimize the function

model <- optim(c(140,75),llik.mean, method
= "CG", control = list(fnscale = -1),

hessian = TRUE)

model

#
Pick off the separate coefficients from the model vector

#
of coefficients for later use

mu <- model$par[1]

mu

Sig
<- model$par[2]

Sig

#
Calculate the variance-covariance matrix of coefficients

#
from the Hessian matrix.

v
<- -solve(model$hessian)

v

#
Calculate the standard errors from the variance-

#
covariance matrix of coefficients.

se <- sqrt(
diag(v))

se

#
Calculate the z statistics from the coefficients and

#
standard errors

b
<- model$par

b

z
<-b/se

z

#
Note that the estimate of the error variance is biased by

#
n/n-1

#
we can correct the preceding estimate of the standard

#
deviation as follows

Sig

Sigunb <- sqrt(n/(n-1)*Sig^2)

Sigunb