FUNDAMENTALS OF STATA
The purpose of this session
is to introduce to STATA and gain some familiarity with a few STATA commands
that will be used in future sessions.
SOME BASICS
Most STATA programs require
only two lines, though there may be other lines added
to transform data, calculate results, etc. These are:
use “dataset.name”, clear
model commands
The first line reads in the
data. The second line tells STATA which model to estimate and gives some
options for the particular model.
You can enter STATA commands
either from the STATA COMMAND window. Or
you can enter them into a batch file called a DO-FILE. In this lesson we will use a DO-FILE, because
it allows you to run blocks of commands in a batch mode and also save your
commands into a file that can be used again.
As an example, enter STATA by
clicking the STATA icon from the Windows desktop or program menu. Then click on
the DO-FILE Editor Icon on the STATA menu. This opens a work area for editing a
set of STATA commands. Now type in the following.
use
"example.dta", clear
sum y
x1 x2 x3
cor y x1 x2 x3
cor y x1 x2 x3, cov
Make sure that the path is
correct for finding the file called “example.dta” on the first line. Now, execute this program by clicking on the
tools menu and select “do to bottom”.
This brief program reads in a data set contained in a file called
“example.dta” and computes descriptive statistics on the variables in “example.dta.”
An alternative way to run
this program would have been to mark the text and press the second icon from
the left (do current file).
A fast way to mark all of the
text in a command file is by pressing CTRL-A. To run an entire set of commands
in a command window press CTRL-A and then execute in
the above fashion.
Another way to run the entire
program is to click on the last icon in the do-file (run current file). This will run the entire do-file. However, you will not see the output.
If you wish to run a subset
of the do-file, highlight the lines you wish to run and select “do” in the
tools menu or click on the second icon from the left (do current file). This is
often useful for de-bugging programs.
Now let's consider what was
in the command file we just executed.
The "use" option
tells STATA to use the data set that follows, “example.dta.” The command “clear” tells STATA that it may
delete any data set that exists in memory.
The “sum” command is short for “summarize” and
tells STATA to do descriptive statistics on variables y,x1,x2,x3.
The command “cor” tells STATA to report the
correlations that exist between the variables.
Adding the command “,cov”
tells STATA to report the covariances.
Note that the typical STATA
command begins with a keyword
(e.g., use, sum, reg, gen,…) that
alerts the program that a command has begun.
Keywords are usually the first three letters of a model or other
command. It is ok to use the entire
word, but it is more parsimonious to abbreviate. One important point is that STATA is case
sensitive in regards to variables and commands.
All commands in STATA are lower-case.
You may save a set of
commands from the command file by selecting FILE SAVE or SAVE-AS from the STATA
do-file menu. You will be prompted for a file name for the saved file that can
then be used for later work. If you wish
to save changes to the data, go to the main STATA menu and select FILE SAVE or
SAVE-AS.
You may also want to print
results or save them for viewing. To print, select FILE PRINT RESULTS from the
main STATA menu. To save output from a
STATA session you may type “log using “filename”, replace” in the beginning of
the do file. NOTE: This will overwrite
any previously created file by the same name.
MODEL Commands
STATA has many
"canned" statistical procedures that can be executed using a single
MODEL statement with options. Many other statistical procedures can be
performed by creating a program for function optimization using the maximum
likelihood features of STATA. We will use this latter approach frequently in
this course. Below is a listing of models that can be called in the MODEL line.
Most of the procedures have numerous options, so this is merely an overview.
The quick reference guide distributed with the STATA manual lists the full set
of options. For a full description of each model go to the STATA HELP menu,
enter the model command, and choose from the listed options. You should pay
particular attention for now to the “summarize”, “regress”, and “program”
commands that will be used over the next few sessions. Here is a list of
"canned" STATA procedures.
arima Box-Jenkins ARIMA models.
biprobit
Bivariate probit models.
boxcox
MLE or nonlinear least squares for Box-Cox model.
tabulate
Cross-tabulation. Frequency counts and contingency tables.
nlogit
Random utility models and nested logit models.
sum
Descriptive statistics.
graph there are a number of graphing options
availible in STATA.
truncreg
Completely censored data.
hist
Histogram
robust following a regression
equation transforms estimates the Huber/White Sandwich estimator of variance.
corrgran
Plot autocorrelations and partial autocorrelations.
logit Binomial or multinomial logit
model.
lnormal
estimates maximum-likelihood log-normal distribution (survival time) models
biprobit estimates maximum-likelihood two-equation probit models -- either a bivariate
probit or a seemingly unrelated probit
(limited to two equations). For partial observability or sample
selection
nbreg Negative binomial regression models.
nlogit
Nested logit and conditional logit
models.
suest
seemingly unrelated regressions (web install).
oprobit
Ordered probit or logit
models.
probit
Univariate probit model.
tobit
Censored regression.
truncreg Truncated regression
dotplot
Scatter diagrams.
poisson Poisson and
negative binomial regression models.
regress (or
reg)
Classical regression. Least squares regression.
streg
Analysis of duration data.
switchr
switching regressions (web install)
xtreg Time
series/cross section regressions.
testnl
Test restrictions or obtain variances for nonlinear functions.
ivreg
Two stage least squares.
reg3 Three
stage least squares.
Transformations and
Calculations
The basic command for
creating new variables is “generate”. The format for the “generate” command is:
generate newvar=function
where “newvar” is the new
variable name and function is a mathematical function. For a list of available
functions see the STATA help guide.
If you want to see a listing
of the new variable, use:
list newvar
An extended set of variable
transformations is available through STATA’s “egen”
command. The format is the same as above. See “help egen” for more information.
The basic command for working
with scalars is “scalar”. The format for this command is:
scalar newscalar=number
where “newscalar” is the new
scalar created. If you want to see a listing of the scalar use:
display newscalar
.
ASSIGNMENT: As an exercise
for this first session, read the data from the data file called “example.*.”
Use the “generate” command to create some new variables, x1, x2, x1*x2 and x3
squared (See STATA help file). Compute descriptive statistics on these
variables. Use the “regress” command to do a regression of y on a constant,
x1*x2, and x3 squared. (See STATA help file for regression). In your regression
list and save the residuals and predicted values to new variables, plot the
residuals, and obtain CUSUM plots for parameter stability. Also, use the “fitstat” option to save and view the log likelihood.
Try this first independently.
However, if you get into trouble, here is what the command file would look like
(“fitstat” and “cusum6” must be downloaded and
installed through STATA help).
/* How
to get data into STATA
*/
/* From ascii text with no variable
names */
infile Y X1 X2 X3 using
"C:\Users\wood\Documents\My Teaching\Maximum Likelihood\Data\EXAMPLE
NONames.txt", clear
/* From
an Excel spreadsheet converted to .txt or .csv */
insheet using
"C:\Users\wood\Documents\My Teaching\Maximum Likelihood\data\example.txt",
clear
insheet using
"C:\Users\wood\Documents\My Teaching\Maximum
Likelihood\data\example.csv", clear
/* From
an SPSS .sav file you need the user written procedure
"usespss" */
/*
Package must be installed to use it. */
usespss using
"C:\Users\wood\Documents\My Teaching\Maximum
Likelihood\data\example.sav", clear
/* To convert any of these to a STATA data file, just select
File, Save As from the
STATA menu. For more information on reading
and creating data files, go to the STATA HELP menu, select data, and read
chapter 24.
*/
/* From
a STATA data file */
use
"C:\Users\wood\Documents\My Teaching\Maximum
Likelihood\data\example.dta", clear
/* Save
your output to a file. */
log
using “examplelog”, replace
/*
Compute Various Summary Statistic */
sum y
x1 x2 x3
cor y x1 x2 x3
cor y x1 x2 x3, cov
/*Transform
Variables */
generate x1x2=x1*x2
generate x3sqr=x3^2
sum
x1x2 x3sqr
/* Run
a Linear Regression and Get Some Output */
regress
y x1x2 x3sqr
predict
yhat
predict
e, resid
fitstat, saving (logl)
list yhat e
/*
Graph the Residuals against Y */
graph twoway (scatter y e)
/* Test
Coefficient Stability */
tsset time
cusum6
y x1x2 x3sqr
/*
Close the Log File */
log close