FUNDAMENTALS OF R
The purpose of this session
is to introduce you to R and gain some familiarity with a few R commands that
will be used in future sessions.
SOME BASICS
Invoke R by double clicking
the R icon on your desktop.
To quit R, type q()
at the R prompt (>) and press the Enter
key. Or, select exit from the program file menu.
R is a high level programming
language based on objects.
> objects() # list the names of all objects
> rm(data1) #remove the object named data1
from the current environment
In using R you must define
each thing to be used as an object. For example, to get data into R you need to
define a dataset as an object and then attach it. For example, to define an
object “Example” from an external dataset EXAMPLE.txt type
the following.
Example <- read.table("C:/Rstuff/EXAMPLE.txt",
header=TRUE)
attach(Example)
This two command sequence
assumes you are reading a text file containing variable names in the first
line. R will read text files, files created in R, as well as data files written
in other formats such as STATA and SPSS using the “foreign” library. Make sure
that the path is correct for finding the file called “EXAMPLE.txt” on the first
line. Note that R is case sensitive. All
R commands assume lower case.
attach() allows your to
reference variables in dataframe Example without the
cumbersome Example$variable.
Once you have your data into
R you may do stuff with it such as the following. To view an object, type its
name at the command prompt. For example, type the following.
Example
To obtain summary statistics
on the object “Example” type:
summary(Example)
The object “Example” that we
have defined contains variables Y, X1, X2, and X3. To get the mean, variance,
and standard deviation of X1 type the following.
mean(X1)
var(X1)
sd(X1)
It is also easy to create new
variables (also Objects) in R. Assuming
you have attached the dataset “Example”, type in the
following short list of commands.
Example$NEWVAR <- Y+X1+X2+X3
detach(Example)
attach(Example)
This
creates a new variable called NEWVAR which is the sum of the other variables.
The subsequent detach and attach lines assure that the new variable is referenced
on the same dataset Example.
The assignment
operator (<-) stores the value (object) on the right side of (<-)
expression in the left side. Once assigned, the object can be used just as an
ordinary component of the computation. To find out what the object looks like,
simply type its name. Note that R is case sensitive, e.g., object names abc, ABC, Abc
are all different.
> x<- log(2.843432) *pi
> x
[1] 3.283001
To
obtain a correlation matrix on the variables in “Example”, type the following.
cor(Example[,c("X1","X2","X3","Y")],
use="complete.obs")
To
obtain a scatterplot between two variables with a
regression line, labels, boxplots, and a parametric
regression line type the following.
scatterplot(Y~X1,
reg.line=lm, boxplots='xy', smooth=TRUE, span =0.5, data=Example)
R can be also used as an
ordinary calculator. Try these examples:
> 2 + 3 * 5 # Note the order of operations.
> log (10) #
Natural logarithm with base e=2.718282
> 4^2 # 4 raised to the second power
> 3/2 # Division
> sqrt (16) # Square root
> abs (3-7) #
Absolute value of 3-7
> pi # The
mysterious number
> exp(2) #
exponential function
> 15 %/% 4 # This is the
integer divide operation
> # This is a comment line
You can enter R commands from
the R COMMAND window directly simply by typing the command and pressing the
enter key. Or you can enter commands through
any text processor and then paste the lines into the R command window. The
advantage of using a text processor is that you can save the commands in a file
to be used later.
I like Tinn-R
as a text processor with R because it has an interface specifically designed
for R. Rcmdr is another commonly used text processor
that has a point and click interface for many R commands (similar to STATA).
You can also use the internal R text processor, windows text editor, Microsoft
Word, or Wordperfect in text mode.
All of the preceding can be
executed in a batch file format by typing the lines into a plain text file and
then copying and pasting the selected text into the R command window.
If you wish to run a subset
of the file, highlight the lines you wish to run and copy and paste those lines
into the R command window.
Note that the typical R
command contains the arrow, which defines an object.
To print results from your
output, select FILE PRINT from the main R menu.
Make sure that your cursor is in the window you want to print. To save
output from an R session select FILE, SAVE TO FILE.
R can save the objects you
create in a session. This can be good or bad. If you do not wish to save
objects from a “messy” session, then you might want to start each file with the
following command to remove all objects.
rm(list=ls(all=TRUE))
Note that R has an extensive
library of procedures that can be installed. To see many of the available
packages for R go to the following web link.
http://cran.sixsigmaonline.org/
Click on Package (on the
left) and then the package to get a description.
To install the package in R
go to the PACKAGES menu within R, click ‘install package’, and select the
package to install.
In order to use the installed
package it must be loaded. Load a package by typing the following.
library(packagename)
After installation, to see
what is in an installed and loaded package, type the following.
help(package=packagename)
To get help with a command
within a package that is already installed and loaded, type the following.
help(commandname)
or
example(commandname)
All of the assignments on the
syllabus will require the ‘car’ package to be
installed and loaded. You can do this by putting the following command at the top
of your command file.
library(car)
Alternatively, you might want
to put this command in the ‘Rprofile.site’ file so
that it is loaded automatically each time you start R. The profile is located
in the etc subdirectory where R is located
ASSIGNMENT: As an exercise for
this first session, read the data from the data file called “example.txt.” Do
all of the preceding commands. Then, create some new variables, x1*x2 and x3
squared. Compute descriptive statistics on these new variables. Use the “lm”
command to do a regression of y on a constant, x1, x2, x1*x2, and x3 squared.
Following your regression, save the residuals and predicted values to new
variables on the dataset, list and plot the residuals, and obtain CUSUM plots
for parameter stability on the model. Finally, use the “anova”
command to view the analysis of variance table for the variables in the model.
Try this first independently.
However, if you get into trouble, here is what the command file would look
like.
# This
file is intended to get you started with R.
# First
read in the data and examine it
.
Example <- read.table("C:/Rstuff/EXAMPLE.txt",
header=TRUE)
attach(Example)
Example
# Now
summarize the data for all variables.
summary(Example)
# Now
compute the mean, variance, standard deviation, etc. for a single variable.
mean(X1)
var(X1)
sd(X1)
median(X1)
max(X1)
min(X1)
# Now
compute the skewness and kurtosis using the e1071
library
library(e1071)
skewness(X1)
kurtosis(X1)
# Now
create a new variable and add it to the active data file.
Example$NEWVAR <- Y+X1+X2+X3
detach(Example)
attach(Example)
# Now
do a correlation matrix among the original variables.
cor(Example[,c("X1","X2","X3","Y")],
use="complete.obs")
# Now
do a scatterplot between two variables with a
superimposed regression line,
# a
parametric regression line, and box plots for each variable. This is in the car
library.
library(car)
scatterplot(Y~X1,
reg.line=lm, boxplots='xy', smooth=TRUE, span =0.5, data=Example)
# Now create two variables X1*X2 and X3 squared and add to the
active data.
Example$X1X2 <-
X1 * X2
Example$X3sqr <-
X3^2
detach(Example)
attach(Example)
# Get descriptive
statistics on the new variables
summary(X1X2)
var(X1X2)
sd(X1X2)
summary(X3sqr)
var(X3sqr)
sd(X3sqr)
# Look at the entire
dataset.
Example
# Now
regress Y on X1, X2, X1X2, and X3sqr and look at the output object.
regress.model <- lm(Y ~ X1 + X2 + X1X2 + X3sqr)
summary(regress.model)
# Now
add the residuals and predicted values to the dataset.
Example$residuals <- residuals(regress.model)
Example$fitted <- fitted(regress.model)
detach(Example)
attach(Example)
# Now
list and plot the residuals and predicted values
residuals
plot(residuals)
fitted
plot(fitted)
# Now
get the analysis of variance table for the regression.
anova(regress.model)
# Now
construct a CUSUM plot for model stability.
library(strucchange)
plot(efp(Y ~ X1 + X2 + X1X2 + X3sqr, type = "Rec-CUSUM"))