Assignment One:
Part One: simple stata data management and analysis commands.
Load the dataset: Assign1
Start a log file.
Make a new variable that is the product of x1 and x2 (call it x12).
Recode x3. Make values 1-3 into value 1, value 4 into value 2, values 5-7 into 3 and value 9 missing.
Compute the descriptive statistics on all of the variables (old and new—only the recoded values of x3).
Do a regression where y is the dependent variable and x1, x2, x3, and x12 are the independent variables.
Save the predicted value of y as yhat and the residual from the regression as resid.
Compute the descriptive statistics of yhat and resid.
Give me the log and the do files.
Part Two: Stata and matrices.
This part of the assignment is intended to introduce you to Stata’s matrix language and get a little bit of experience with matrix algebra.
/* the first thing you want to do is clear Stata’s memory. The command (easily enough) is clear*/
clear
/* the first thing you need to do when working with Stata’s matrices is to make sure you have enough room in its memory. This is done by setting the maximum matrix size. The command is matsize. Matsize gives the largest number of rows or columns a matrix is allowed to have. Usually, this is not a big deal (and it won’t be for this assignment), but there are some commands and programs that Stata has that use the matrix commands and you need to make sure the matsize is big enough. Note that matsize affects matrix manipulation but not data storage. Your data matrix can be bigger than this, you just cannot have more variables included in a statistical model then your matsize. The default is 400 and this is generally big enough, but we will change it for the sake of changing it. */
set matsize 1000
/*Let’s start by creating and manipulating matrices. Reminder: matrices are simply rectangular arrays of data with dimension r x c, where r is the number of rows (usually observations) and c is the number of columns (usually variables). Stata’s basic command is matrix name = (elements)
where name is the name of the
matrix and elements are the values of
the individual entries in the matrix.
Individual elements are separated using a comma, and rows are delineated
using a “\”
Let’s start with a 3 x 3 matrix. */
matrix a = (1, 2, 3\ 4, 5, 6\ 7, 8, 9)
/* note that it doesn’t look like Stata has actually done anything. To display the content of the matrix you need the matrix list command*/
matrix list a
/*Identity matrices are awfully useful and Stata has a special command for creating them I(n) where n is the number of rows/columns (this is the same number since it is a square matrix). */
matrix ident = I(3)
matrix list ident
/*ok, let’s do some manipulating. The easiest manipulation is the transpose. Again, transposing simply changes the rows and columns. The command is simply ’ Let’s transpose a. */
matrix at = a’
matrix list a
matrix list at
/*matrix addition and subtraction is also easy. These are the + and – signs. Let’s first make a new matrix and then add it to a*/
matrix b = 2, 2, 2, 2\ 3, 3, 3, 3\ 4, 4, 4, 4
matrix aplusb= a+b
/*that didn’t work—Stata gave an error message of “conformability error.” Matrix addition and subtraction work by adding the corresponding elements of each matrix together. To do that, the matrices need to have the same dimensions. a is a 3x3 matrix, b is a 3x4. Let’s try again*/
matrix c = 2, 2, 2\ 3, 3, 3\ 4, 4, 4
matrix aplusc = a+c
matrix list aplusc
/*matrix multiplication. Matrix multiplication involves taking the inner products of rows and columns. Matrix multiplication can only occur if the “inner” dimensions are the same (r1xc1 * r2xc2—c1 and r2 have to be the same number). So a 2x3 matrix can be multiplied by any matrix that has 3 rows, regardless of how many columns it has. It cannot be multiplied by any matrix that does not have three rows. The resulting matrix has as many rows as the first matrix and as many columns as the second. */
matrix g = 1, 5, 7\ 2, 3, 2
matrix h = 2, 3, 1\ 3, 2, 8\ 4, 5, 1
matrix i = g*h
matrix list i
/*two other things about matrix multiplication: 1) order matters. Pre-multiplication and post multiplication are different things. That is, g*h does not equal h*g. In fact, given the way g and h are set up, h*g is non-conformable. The inner dimensions are different. h*g’ however, does work*/
matrix j = h*g’
matrix list i
matrix list j
/*note that i & j are not only different matrices, but they have different dimensions. */
/* the identity matrix takes on the values 1 in matrix multiplication. That is, any matrix multiplied by the identity matrix returns the original matrix. Plus, the ordering does not matter. aI = Ia */
matrix list a
matrix ai= a*ident
matrix list ai
matrix ia = ident*a /*this could also be written matrix ia = I(3)*a
/* while the ordering matters, if you have more than 2 matrices being multiplied, the ordering in which you multiply them together does not matter. For instance, if you want to multiply three matrices, A, B, and C then A*B*C= (A*B)*C= A*(B*C)*/
matrix A= 1,2\3,4
matrix B = 2,3\4,5
matrix C = 3,4\5,6
matrix D= A*B*C
matrix E = (A*B)*C
matrix F = A*(B*C)
matrix list D
matrix list E
matrix list F
/* one last tidbit. (AB)’ = B’A’ and (ABC)’ = C’B’A’. This generalizes to any size matrices. */
matrix abt = (A*B)’
matrix ca = B’*A’
matrix list abt
matrix list ca
/*There really isn’t a single operation that is equivalent to matrix division. The closest is multiplying by a matrix’s inverse. The inverse of a matrix is a matrix that produces the identity matrix when it is multiplied by the matrix. Ainverse*A = I. It is difficult to produce these matrices by hand. It is easy for Stata to do so if the matrix is invertible. The command is inv or syminv (syminv is if the matrix itself is square and symmetric. This is often, but not always the case for things we will be doing. Syminv is faster and more accurate.*/
matrix k = 7, 3, 2, 1\ 2, 9, 4, 1\2, 2, 10, 3\ 4, 1, 1, 11
matrix kinv = inv(k)
matrix l = k*kinv
matrix list l
matrix l2 = kinv*k
matrix list l2
/* note that the off diagonal elements are essentially zero. They aren’t exact because of rounding error*/
/* there are a few other handy commands worth knowing. The first is the trace. The trace is the sum of the elements along the diagonal of a matrix. For instance, the trace of an identity matrix is equal to the number of columns or rows, since each element on the diagonal is 1. */
matrix list k
/* Note that the trace is a single number, otherwise known as a scalar. Stata uses similar language for dealing with scalars. Instead of leading the command with “matrix” you lead with “scalar”. Thus:*/
scalar trk = trace(k)
scalar list trk
/* there are a series of other scalar function that are also handy: rowsof( ) colsof( ) [these report back the number of rows or columns of the matrix in the parentheses]. det( ) returns the determinant of a square matrix. What is the determinant? Technically it is the volume of the p-1 dimensional manifold described by the matrix in p-dimensional space. You do not need to know what this means. You may need to know how to get it. */
/* before we move on there are two things that might come in
handy about Stata’s programming. First, you can name rows and columns. Naming is easy. The command is matrix rowname
matname = names
or
matrix colname matname = names */
matrix rowname k = one two three four
matrix colname k = a b c d
matrix list k
/* the second handy thing is that you can make new matrices that are subsets of old matrices. For instance, say you want everything but the first column of k. */
matrix m = k[1..., 2...]
matrix list k
matrix list m
/* you can also use row or column names to stand in for the numbers. Note that these need to be in quotes*/
matrix n = k[“two” ...”four”,
“a”... “d”] /*note that Stata does not like to import the
quote marks. If you simply cut and paste
this into a do file you will need to retype in the quotes*/
matrix list n
/*it is also very handy to convert data to a matrix. Stata seems to keep these separate. Things loaded as data are stored in one place in its memory, matrices are stored somewhere else. The command is mkmat. For the assignment that follows you will need to take the data from the file matrix.dta and do some matrix manipulations with it, so let’s figure how to do that. The format of the command is mkmat varnames, matrix(matname) where varnames tells Stata which variables are to be included in the matrix and matname is the name of the new matrix. This function has one other option, nomissing. If you tell Stata this, it will remove the row (observation) from the matrix is any element in the row is missing. The data here have no missing, but this is still a handy thing to know.\*/
use matrix.dta, clear
/*note that you will need to make the file location work. Alternatively, you can simply use the open data button for this step. I encourage you to use a do file and the command because then it is clear what data were used to create what results*/
mkmat y x1 x2 x3 x4, matrix(o)
matrix list o
/* the last set of commands that I think you need at this point (solely because it will make you life easier in a little while) is to get Stata to for cross-products for you. Let’s say our data are set up for regression. We have a dependent variable y and a matrix of independent variables X. This command will allow you to create a single matrix that will hold X’X and X’y, both of which will be needed for regression. The command you want is matrix accum. This creates the cross products of whatever variables are in the matrix. Why do we want this? Let’s go back to the matrix that we converted from dataset above, matrix o. Let’s say we want to do a regression where y is the dependent variable, and the x’s are the independent variables. Matrix accum is a simple way to make a matrix that has both X’X and X’y in it. Since we need both of these in calculating the OLS estimates of β, this can save us time and programming.*/
matrix accum p = y x1 x2 x3 x4
matrix list p
/*matrix accum does one other nice thing. It adds a constant. Remember from class that OLS assumes there is a constant in the model. If not bad things happen. This commands adds the constant (_cons) when it forms the cross products so we don’t have to.
So what is this matrix p? element 1,1 is y’y. everything below it in the first column is the vector X’y. The rest of the first row is y’X. The rest of the matrix is X’X. Given what you know you should be able to take this matrix, partition it into the needed pieces, perform the needed manipulations and multiplication and get the OLS results from this regression.*/
/*your assignment: get the OLS estimates of β with and without using the matrix accum command. Check the estimates versus the results from the regress command in stata (that is, doing it the canned/easy way). Submit both your do and log files to me.*/
/* the last thing you need to do is figure out how to get
Stata to convert data into a matrix.
This is not necessary but will make the assignment easier. */