FUNDAMENTALS OF STATA

The purpose of this session is to introduce to STATA and gain some familiarity with a few STATA commands that will be used in future sessions.

SOME BASICS

Most STATA programs require only two lines, though there may be other lines added to transform data, calculate results, etc. These are:

use “dataset.name”,
clear

model commands

The first line reads in the data. The second line tells STATA which model to estimate and gives some options for the particular model.

You can enter STATA commands either from the STATA COMMAND window. Or you can enter them into a batch file called a DO-FILE. In this lesson we will use a DO-FILE, because it allows you to run blocks of commands in a batch mode and also save your commands into a file that can be used again.

As an example, enter STATA by clicking the STATA icon from the Windows desktop or program menu. Then click on the DO-FILE Editor Icon on the STATA menu. This opens a work area for editing a set of STATA commands. Now type in the following.

use "example.dta",
clear

sum y x1 x2 x3

cor y x1 x2 x3

cor y x1 x2 x3, cov

Make sure that the path is correct for finding the file called “example.dta” on the first line. Now, execute this program by clicking on the tools menu and select “do to bottom”. This brief program reads in a data set contained in a file called “example.dta” and computes descriptive statistics on the variables in “example.dta.”

An alternative way to run this program would have been to mark the text and press the second icon from the left (do current file).

A fast way to mark all of the text in a command file is by pressing CTRL-A. To run an entire set of commands in a command window press CTRL-A and then execute in the above fashion.

Another way to run the entire program is to click on the last icon in the do-file (run current file). This will run the entire do-file. However, you will not see the output.

If you wish to run a subset of the do-file, highlight the lines you wish to run and select “do” in the tools menu or click on the second icon from the left (do current file). This is often useful for de-bugging programs.

Now let's consider what was in the command file we just executed.

The "use" option tells STATA to use the data set that follows, “example.dta.” The command “clear” tells STATA that it may delete any data set that exists in memory.

The “sum” command is short for “summarize” and tells STATA to do descriptive statistics on variables y,x1,x2,x3. The command “cor” tells STATA to report the correlations that exist between the variables. Adding the command “,cov” tells STATA to report the covariances.

Note that the typical STATA command begins with a keyword (e.g., use, sum, reg, gen,…) that alerts the program that a command has begun. Keywords are usually the first three letters of a model or other command. It is ok to use the entire word, but it is more parsimonious to abbreviate. One important point is that STATA is case sensitive in regards to variables and commands. All commands in STATA are lower-case.

You may save a set of commands from the command file by selecting FILE SAVE or SAVE-AS from the STATA do-file menu. You will be prompted for a file name for the saved file that can then be used for later work. If you wish to save changes to the data, go to the main STATA menu and select FILE SAVE or SAVE-AS.

You may also want to print results or save them for viewing. To print, select FILE PRINT RESULTS from the main STATA menu. To save output from a STATA session you may type “log using “filename”, replace” in the beginning of the do file. NOTE: This will overwrite any previously created file by the same name.

Now lets discuss how to get data into STATA using the “insheet” command.

1. External ASCII file- The data can be in an external ASCII
file created by a text processor with only one observation per line. To use this format type “insheet
using “filename”,clear” into
the do-file

For more information on reading and creating data files, go to the STATA HELP menu, select data, and read chapter 24.

MODEL Commands

STATA has many "canned" statistical procedures that can be executed using a single MODEL statement with options. Many other statistical procedures can be performed by creating a program for function optimization using the maximum likelihood features of STATA. We will use this latter approach frequently in this course. Below is a listing of models that can be called in the MODEL line. Most of the procedures have numerous options, so this is merely an overview. The quick reference guide distributed with the STATA manual lists the full set of options. For a full description of each model go to the STATA HELP menu, enter the model command, and choose from the listed options. You should pay particular attention for now to the “summarize”, “regress”, and “program” commands that will be used over the next few sessions. Here is a list of "canned" STATA procedures.

**arima** Box-Jenkins ARIMA models.

**biprobit**
Bivariate probit models.

**boxcox**
MLE or nonlinear least squares for Box-Cox model.

**tabulate**
Cross-tabulation. Frequency counts and contingency tables.

**nlogit**
Random utility models and nested logit models.

**sum**
Descriptive statistics.

**graph**** **there are a number of graphing options
availible in STATA.

**truncreg**
Completely censored data.

**hist**
Histogram

**robust** following a regression
equation transforms estimates the Huber/White Sandwich estimator of variance.

**corrgran**
Plot autocorrelations and partial autocorrelations.

**logit** Binomial or multinomial logit
model.

**lnormal**
estimates maximum-likelihood log-normal distribution (survival time) models

**biprobit** estimates maximum-likelihood two-equation probit models -- either a bivariate
probit or a seemingly unrelated probit
(limited to two equations). For partial observability or sample

selection

**nbreg** Negative binomial regression models.

**nlogit**
Nested logit and conditional logit
models.

**suest**
seemingly unrelated regressions (web install).

**oprobit**
Ordered probit or logit
models.

**probit**
Univariate probit model.

**tobit**
Censored regression.

**truncreg**** **Truncated regression

**dotplot**
Scatter diagrams.

**poisson**** ** Poisson and
negative binomial regression models.

**regress** (or
**reg**)
Classical regression. Least squares regression.

**streg**
Analysis of duration data.

**switchr**
switching regressions (web install)

**xtreg** Time
series/cross section regressions.

**testnl**
Test restrictions or obtain variances for nonlinear functions.

**ivreg**
Two stage least squares.

**reg3** Three
stage least squares.

Transformations and Calculations

The basic command for creating new variables is “generate”.
The format for the “generate” command is:

generate newvar=function

where “newvar” is the new variable name and function is a mathematical function. For a list of available functions see the STATA help guide.

If you want to see a listing of the new variable, use:

list newvar

The basic command for working with scalars is “scalar”. The
format for this command is:

scalar newscalar=number

where “newscalar” is the new scalar created. If you want to see a listing of the scalar use:

display newscalar

.

ASSIGNMENT: As an exercise for this first session, read the data from the data file called “example.dta.” Use the “generate” command to create some new variables, x1, x2, x1*x2 and x3 squared (See STATA help file). Compute descriptive statistics on these variables. Use the “regress” command to do a regression of y on a constant, x1*x2, and x3 squared. (See STATA help file). In your regression list and save the residuals and predicted values to new variables, plot the residuals, and obtain CUSUM plots for parameter stability. Finally, use the “fitstat” option to save and view the log likelihood.

Try this first independently. However, if you get into trouble, here is what the command file would look like (“fitstat” and “cusum6” must be downloaded and installed through STATA help).

use "example.dta",
clear

log using “examplelog”,
replace

sum y x1 x2 x3

cor y x1 x2 x3

cor y x1 x2 x3, cov

generate x1x2=x1*x2

generate x3sqr=x3^2

sum x1x2 x3sqr

regress y x1x2 x3sqr

predict yhat

predict e, resid

fitstat, saving (logl)

graph twoway (scatter y e)

list yhat e

tsset time

cusum6 y x1x2 x3sqr

log close