FUNDAMENTALS OF R

 

The purpose of this session is to introduce you to R and gain some familiarity with a few R commands that will be used in future sessions.

 

SOME BASICS

 

Invoke R by double clicking the R icon on your desktop.

 

To quit R, type q() at the R prompt (>) and press the Enter key. Or, select exit from the program file menu.

 

R is a high level programming language based on objects.

 

> objects() # list the names of all objects

> rm(data1)    #remove the object named data1 from the current environment

 

In using R you must define each thing to be used as an object. For example, to get data into R you need to define a dataset as an object and then attach it. For example, to define an object “Example” from an external dataset EXAMPLE.txt type the following.

 

Example <- read.table("C:/Rstuff/EXAMPLE.txt", header=TRUE)

attach(Example)

 

This two command sequence assumes you are reading a text file containing variable names in the first line. R will read text files, files created in R, as well as data files written in other formats such as STATA and SPSS using the “foreign” library. Make sure that the path is correct for finding the file called “EXAMPLE.txt” on the first line.  Note that R is case sensitive. All R commands assume lower case.

 

attach() allows your to reference variables in dataframe Example without the cumbersome Example$variable.

 

Once you have your data into R you may do stuff with it such as the following. To view an object, type its name at the command prompt. For example, type the following.

 

Example

 

To obtain summary statistics on the object “Example” type:

 

summary(Example)

 

The object “Example” that we have defined contains variables Y, X1, X2, and X3. To get the mean, variance, and standard deviation of X1 type the following.

 

mean(X1)

var(X1)

sd(X1)

 

It is also easy to create new variables (also Objects) in R.  Assuming you have attached the dataset “Example”, type in the following short list of commands.

 

Example$NEWVAR <- Y+X1+X2+X3

detach(Example)

attach(Example)

 

This creates a new variable called NEWVAR which is the sum of the other variables. The subsequent detach and attach lines assure that the new variable is referenced on the same dataset Example.

 

The assignment operator (<-) stores the value (object) on the right side of (<-) expression in the left side. Once assigned, the object can be used just as an ordinary component of the computation. To find out what the object looks like, simply type its name. Note that R is case sensitive, e.g., object names abc, ABC, Abc are all different.

 

> x<- log(2.843432) *pi

> x

[1] 3.283001

 

To obtain a correlation matrix on the variables in “Example”, type the following.

 

cor(Example[,c("X1","X2","X3","Y")], use="complete.obs")

 

To obtain a scatterplot between two variables with a regression line, labels, boxplots, and a parametric regression line type the following.

 

scatterplot(Y~X1, reg.line=lm, boxplots='xy', smooth=TRUE, span =0.5, data=Example)

 

R can be also used as an ordinary calculator. Try these examples:

 

> 2 + 3 * 5      # Note the order of operations.

> log (10)       # Natural logarithm with base e=2.718282

> 4^2            # 4 raised to the second power

> 3/2            # Division

> sqrt (16)      # Square root

> abs (3-7)      # Absolute value of 3-7

> pi             # The mysterious number

> exp(2)         # exponential function

> 15 %/% 4       # This is the integer divide operation

> # This is a comment line

 

You can enter R commands from the R COMMAND window directly simply by typing the command and pressing the enter key.  Or you can enter commands through any text processor and then paste the lines into the R command window. The advantage of using a text processor is that you can save the commands in a file to be used later.

 

I like Tinn-R as a text processor with R because it has an interface specifically designed for R. Rcmdr is another commonly used text processor that has a point and click interface for many R commands (similar to STATA). You can also use the internal R text processor, windows text editor, Microsoft Word, or Wordperfect in text mode.

 

All of the preceding can be executed in a batch file format by typing the lines into a plain text file and then copying and pasting the selected text into the R command window.

 

If you wish to run a subset of the file, highlight the lines you wish to run and copy and paste those lines into the R command window.

 

Note that the typical R command contains the arrow, which defines an object.

 

To print results from your output, select FILE PRINT from the main R menu.  Make sure that your cursor is in the window you want to print. To save output from an R session select FILE, SAVE TO FILE.

 

R can save the objects you create in a session. This can be good or bad. If you do not wish to save objects from a “messy” session, then you might want to start each file with the following command to remove all objects.

 

rm(list=ls(all=TRUE))

 

Note that R has an extensive library of procedures that can be installed. To see many of the available packages for R go to the following web link.

 

http://cran.sixsigmaonline.org/

 

Click on Package (on the left) and then the package to get a description.

 

To install the package in R go to the PACKAGES menu within R, click ‘install package’, and select the package to install.

 

In order to use the installed package it must be loaded. Load a package by typing the following.

 

library(packagename)

 

After installation, to see what is in an installed and loaded package, type the following.

 

help(package=packagename)

 

To get help with a command within a package that is already installed and loaded, type the following.

 

help(commandname)

 

or

 

example(commandname)

 

All of the assignments on the syllabus will require the ‘car’ package to be installed and loaded. You can do this by putting the following command at the top of your command file.

 

library(car)

 

Alternatively, you might want to put this command in the ‘Rprofile.site’ file so that it is loaded automatically each time you start R. The profile is located in the etc subdirectory where R is located

 

ASSIGNMENT: As an exercise for this first session, read the data from the data file called “example.txt.” Do all of the preceding commands. Then, create some new variables, x1*x2 and x3 squared. Compute descriptive statistics on these new variables. Use the “lm” command to do a regression of y on a constant, x1, x2, x1*x2, and x3 squared. Following your regression, save the residuals and predicted values to new variables on the dataset, list and plot the residuals, and obtain CUSUM plots for parameter stability on the model. Finally, use the “anova” command to view the analysis of variance table for the variables in the model.

 

Try this first independently. However, if you get into trouble, here is what the command file would look like.

 

# This file is intended to get you started with R.

# First read in the data and examine it

.

Example <- read.table("C:/Rstuff/EXAMPLE.txt", header=TRUE)

attach(Example)

Example

 

# Now summarize the data for all variables.

 

summary(Example)

 

# Now compute the mean, variance, standard deviation, etc. for a single variable.

 

mean(X1)

var(X1)

sd(X1)

median(X1)

max(X1)

min(X1)

 

# Now compute the skewness and kurtosis using the e1071 library

 

library(e1071)

skewness(X1)

kurtosis(X1)

 

# Now create a new variable and add it to the active data file.

 

Example$NEWVAR <- Y+X1+X2+X3

detach(Example)

attach(Example)

 

# Now do a correlation matrix among the original variables.

 

cor(Example[,c("X1","X2","X3","Y")], use="complete.obs")

 

# Now do a scatterplot between two variables with a superimposed regression line,

# a parametric regression line, and box plots for each variable. This is in the car library.

 

library(car)

scatterplot(Y~X1, reg.line=lm, boxplots='xy', smooth=TRUE, span =0.5, data=Example)

 

# Now create two variables X1*X2 and X3 squared and add to the active data.

 

Example$X1X2 <- X1 * X2

Example$X3sqr <- X3^2

detach(Example)

attach(Example)

 

# Get descriptive statistics on the new variables

 

summary(X1X2)

var(X1X2)

sd(X1X2)

summary(X3sqr)

var(X3sqr)

sd(X3sqr)

 

# Look at the entire dataset.

 

Example

 

# Now regress Y on X1, X2, X1X2, and X3sqr and look at the output object.

 

regress.model <- lm(Y ~ X1 + X2 + X1X2 + X3sqr)

summary(regress.model)

 

# Now add the residuals and predicted values to the dataset.

 

Example$residuals <- residuals(regress.model)

Example$fitted <- fitted(regress.model)

detach(Example)

attach(Example)

 

# Now list and plot the residuals and predicted values

 

residuals

plot(residuals)

fitted

plot(fitted)

 

# Now get the analysis of variance table for the regression.

 

anova(regress.model)

 

# Now construct a CUSUM plot for model stability.

 

library(strucchange)

plot(efp(Y ~ X1 + X2 + X1X2 + X3sqr, type = "Rec-CUSUM"))