R commands for Intermediary Statistics

Control-R
   Take a command from the script and run it (for windows)

Command-Enter
   Take a command from the script and run it (for MAC)

?plot
   Read the help file on plot

??plot
   Search for all commands that have plot in the description

#Remember to fix this
   A comment that won't be run if you do control-r

x <- 2
   Assign the value of 2 to the box named x

x <- c(2,4,3,5,7)
   Assign the vector of numbers 2 4 3 5 and 7 to the box named x

x <- "hello"
   Assign the characters hello to the box named x ls()
   See all the variables (boxes) that you've created

rm(x)
   Remove the box named x from your list of variables

Control-L
   If you press control and l you'll clear the console (commands you've run)

Data manipulation

x<-read.table("http://www.uwyo.edu/crawford/Datasets/printers.txt",header=TRUE)
   Read in the data set from the url, and save in a box called x with the header names

x<-read.table("http://www.uwyo.edu/crawford/Datasets/algea.txt",header=TRUE,sep="\t")
   Read in the data when it's tab delimited (verses comma delimited)

x<-read.csv("http://www.uwyo.edu/crawford/Datasets/brain.txt",header=TRUE,skip=3)
   Read in the data but skip the first 3 lines (of text)

x[2,]
   In the data set x, only use row 2

x[,3]
   In the data set x, only use column 3

x[2,3]
   In the data set x grab the value in row 2 column 3

x[,c(2,3,4,5)]
   In the data set x only use columns 2, 3, 4, and 5

head(x)
   See just first six rows of the dataset x

nrow(x)
   The number of rows in dataset x

round(x,5)
   round the number x to 5 decimal places

as.numeric(as.character(dataset$nums))
   turn factors (through characters) into numbers

x[x$color=="red",]
   In data set x, find only the rows where the color is red

x[x$height>0,]
   In data set x, find only the rows where the height is above 0

na.omit(x)
   The dataset x except remove any rows that have an "NA" value

x<-rnorm(100,3,2)
   Create 100 random numbers that are normal with a mean of 3 and sd 2. Store it in x

Descriptive Statistics

min(x)
   Find the minimum value in x

max(x)
   Find the maximum value in x

sum(x)
   the sum of x

mean(x)
   the mean of x

sd(x)
   the standard deviation of x

Prerequisite Skills

pnorm(1.96)
   Find the area to the left of a z-score of 1.96

qnorm(0.025)
   Get the left tailed z-score for an area of 0.025

rnorm(10)
   Get 10 random draws from the normal distribution

dnorm(1.96)
   The y value of the bellcurve at 1.96 (for plotting) dpois(5,2)
   The probability for the Poisson distribution where theta=2

pt(1.96,2)
pchisq(1.96,2)
pf(1.96,2,5)
   Areas for other distributions with their degrees of freedom

t.test(x)
   A one sample test of mu=0, also confidence interval for the mean

t.test(x,y)
   Two sample test of mu1=mu2, also confidence interval for the difference

t.test(x,y,paired=TRUE)
   Two sample matched pairs t-test (with confidence interval)

Plots

boxplot(x)
   Make a boxplot of x

hist(x)
   Draw a histogram of x

plot(x)
   Plot the values of x in order (not actually that useful in this class)

plot(y~x)
   Draw a scatterplot of y based on x

plot(y~x,xlim=c(0,100))
   Plot y on x, but make the x axis go from 0 to 100

plot(y~x,ylim=c(0,100))
   Plot y on x, but make the y axis go from 0 to 100

plot(y~x,col="red")
   Plot y on x with red dots

plot(y~x,xlab="Time")
   Plot y on x and label the x axis Time

plot(y~x,ylab="Height")
   Plot y on x and label the y axis Height

plot(y~x,main="Height based on Time")
   Plot y on x and write Height based on Time at the top

lines(y~x)
   Add the line for y on x on top of whatever plot is already there

points(y~x)
   Add the dots for y on x on top of whatever plot is already there

legend("topright",col=c("red","yellow","blue"),legend=c("high","medium","low"),lty=1)
   Put a legend in the top right corner. Have the red line say high, etc.

par(mfrow=c(2,2))
   Start putting 4 plots (2 rows, 2 columns) on one picture

x<-seq(0,10,length=1000)
y<-5+2*x
plot(y~x,type="l")
   Plot the line y=5+2*x

Simple Regression

fit<-lm(y~x,data=flowers)
fit<-lm(flowers$y~flowers$x)
   Predict y based on x, and save the results in a variable (box) called fit

plot(fit)
   plot the residuals (4 different plots)

summary(fit)
   Get slopes, p-values, R^2, and the standard error

confint(fit)
   Computes condidence intervals for one or more parameters in an lm model called fit

predict.lm(fit,newdata=data.frame(x=10),interval="prediction",level=.99)
   Give a 99% prediction interval for possible y values where x=10

predict.lm(fit,newdata=data.frame(x=10),interval="confidence",level=.99)
   Give a 99% confidence interval for the average/prediction where x=10

AIC(fit)
BIC(fit)
   Get the AIC or BIC values for a model

fit<-aov(y~x,data=flowers)
fit<-aov(flowers$y~flowers$x)
   Predict y based on x, and save the results in a variable (box) called fit anova(fit)
   Get the sums of squares and F tests for the ANOVA

TukeyHSD(fit)
   Perform pairwise confidence intervals with Tukey's EWER correction

Transformed Regression

lm(y~I(x^2),data=flowers)
   Predict y based on x squared

lm(y~x+I(x^2)+q+w+q*w,data=flowers)
   predict y based on x, x^2, q, w, and their interaction

log(2)
   log uses base e
log10(2)
   log10 uses base 10
exp(2)
   exponent on e

lm(y~log(x),data=flowers)
   log does not use the I() notation

Residual Plots

plot(fit,which=1)
   Plot the first residual plot (residuals vs fitted values)

plot(fit$residuals~x)
   Plot the residuals against x

plot(fit$residuals~fit$fitted.values)
   Plot the residuals against the predicted values

qqnorm(fit$residuals)
   Plot a QQ plot of the residuals

qqline(fit$residuals)
   Put the line on a QQplot

plot(density(fit$residuals))
   Draw the density plot of the residuals

plot(cooks.distance(fit))
   See Cook's Distance for each data point

Advanced Methods

install.packages("leaps")
   Downloading a package of new commands from another server

library(leaps)
   Installing the package of new commands into the R you're using

bestfit<-regsubsets(y~x,nbest=1,nvmax=16)
   Perform best subset regression on the data

summary(bestfit)$bic
summary(bestfit)$which[2,]
   Pick out specific pieces from the best subset regression

plot(TukeyHSD(fit))
   See pairwise plots of the Tukey's HSD comparisons

chisq.test(data)
   Take a matrix and does an independence chi-squared test

chisq.test(data,probs)
   Perform Goodness of Fit on the data

kruskal.wallis(data)
   Perfom a Kruskal Wallis test

kruskal.wallis(data,paired=TRUE)
   Perfom a Signed Rank test

R commands for Intermediary Statistics

Basic Stuff

Data Manipulation

Descriptive Statistics

Prerequisite

Plots

SimpleRegression

Transformed Regression

Residual Plots

Advanced Methods

Basic Stuff

Data manipulation

Descriptive Statistics

Prerequisite Skills

Plots

Simple Regression

Transformed Regression

Residual Plots

Advanced Methods