PROBABILITY DISTRIBUTIONS
AND
ESTIMATING A MEAN AND VARIANCE USING MLE

 

The purpose of this session is to familiarize you with some of the more important probability distributions and to take an initial step in understanding how probability distributions relate to likelihood and log likelihood functions.

PROBABILITY DISTRIBUTIONS: Recall that a probability distribution is a mapping from a random variable (call it Y) to the probability of having observed that random variable (P[Y]). Maximum Likelihood Estimation (MLE) involves specifying a probability distribution that describes the social science experiment that generated the random variable under investigation. Of course, the data generation process may have been consistent with more than one probability distribution since it is possible to derive some distributions from others.

The first part of this assignment is to go to Microsoft Excel and call up various spreadsheets that plot some of the more interesting univariate probability distributions. You should have the following spreadsheets for this exercise.


Normal Distribution
Chi-Square Distribution
Bernoulli Distribution
Binomial Distribution
Negative Binomial Distribution
Poisson Distribution
Weibull Distribution
Exponential Distribution
Gamma Distribution
LogNormal Distribution
Beta Distribution

Note that there are two spreadsheets for the Gamma and Beta distributions to illustrate the different ways in which these may be parameterized. In general, there are many possible parameterizations of most probability distributions.

For each of these spreadsheets, the cells marked green are areas where you may change the parameters and observe the behavior of the resulting distributions. (Note: You may need to resize or reposition the graphs because of the different settings and different versions of Excel that are around.) In any case, play with each of these spreadsheets, noting the essential features of each probability distribution, including the range of the random variable, the range of the parameters, and the effect of the parameters on means, modes, skew, kurtosis, and shape. The Evans, Hastings, and Peacock text contains many more details on each of these distributions, so you might use this reference in combination with the spreadsheets to learn more about the probability functions. As an exercise, you might also try on your own to graph one or more of the other probability distributions in the Evans, Hastings, and Peacock book.

The probability distributions in these spreadsheets are univariate distributions, meaning that there is a single random variable in the domain of the probability function. However, social scientists are also sometimes interested in the joint probability associated with multiple random variables. For example, we sometimes assume a bivariate normal distribution when there are two dependent variables, both normal, to be modeled simultaneously. Another example, the multinomial distribution is a discrete joint distribution with dimensions equal to the number of categories in the multinomial variable. Excel doesn't do so well in plotting multivariate distributions, particularly when they are continuous. However, Maple can do the job. Use Maple to look at the bivariate normal distribution contained in the file called Bivariate Normal.mws. Change r (the correlation between the two random variables), s1, s2, m1, and m2 to observe the effect on the distribution.

ESTIMATING A MEAN AND VARIANCE OF A DISTRIBUTION USING MLE: Maximum likelihood is purely and simply an estimation technique. In practice, we specify a probability distribution that could have generated the data, put that probability distribution into a likelihood and log-likelihood function, and then estimate the parameters of that distribution using MLE. One approach to implementing MLE would be through trial and error. As an example of how this might work, go to the Excel spreadsheet entitled MLNormal Mu.xls. In this spreadsheet I have entered the data on page 9 of Eliason in the second column. In the first column I have a vector of initial guesses for the mean of the distribution. In the third column is the log-likelihood associated with each initial guess, assuming independent draws from a normal distribution. (Note: We could have used any of the distributions above in computing the log-likelihoods). The MLE estimate of the mean is just the guess that produces the largest number in column 3. The graph plots the values of the log-likelihood function in column 3 against the vector of guesses in column 1. We can also look at the graph to find the maximum. If we want to increase the precision of estimation, then we can change the vector of guesses to range from say 1.81 through 1.9. Do this to get a sense of what happens to the log-likelihoods and graph.

Of course, trial and error methods are very inefficient and may also be quite cumbersome when there are multiple parameters. Thus, a better way to do estimation is using the methods of calculus or iterative techniques. In class we show how to optimize a function using both analytical and numerical methods. Computers are very useful in implementing the latter. Below is a short STATA program for finding the mean and variance of a normally distributed variable using MLE.


Implement this program to become initially familiar with STATA's ML features. STATA has several different ML estimation routines.  The one below is an example of the LNF procedure.  We shall demonstrate others in future lessons.

/* This file demonstrates maximum likelihood estimation of normal models. We compute the mean and variance using maximum likelihood methods */

clear

 

/* The next line will read a data file. Change the path to find data */

use "ostrom.dta", clear

 

 

/* The next line sets the sample for observations 1 through 22 */

gen id = _n

keep if (1<=id<=22)

 

/* Now let's print the data and compute descriptive statistics */

list year us ussr

summarize year us ussr, detail

correlate year us ussr

 

 

/* Now lets compute the mean and standard deviation of US using

maximum likelihood.  First define a program to calculate the mean */

 

program define meanest

version 7

args lnf theta1 theta2

quietly replace `lnf'=-ln(sqrt(2*_pi*`theta2'^2))-1/(2*`theta2'^2)*($ML_y1-`theta1')^2

end

 

/* Now call the program using ml for estimation. */

 

ml model lf meanest (mean:us=) (sigma:)

ml search

ml init 140 79, copy

ml report

ml plot  _cons

ml maximize

ml graph

 

/* The MLE of the standard deviation (and variance) is biased.

We can get the unbiased estimate by multiplying the variance by

n/n-k */

 

matrix list e(b)

scalar mean=[mean]_cons

display mean

scalar sig=[sigma]_cons

display sig

scalar sigunb=sqrt([sigma]_cons^2*(_N/(_N-1)))

display sigunb

 

/* Delete the program and exit */

 

program drop meanest

exit

clear