R commands for Intermediary Statistics
Basic Stuff
Control-R
Take a command from the script and run it (for windows)
Command-Enter
Take a command from the script and run it (for MAC)
?plot
Read the help file on plot
??plot
Search for all commands that have plot in the description
#Remember to fix this
A comment that won't be run if you do control-r
x <- 2
Assign the value of 2 to the box named x
x <- c(2,4,3,5,7)
Assign the vector of numbers 2 4 3 5 and 7 to the box named x
x <- "hello"
Assign the characters hello to the box named x
ls()
See all the variables (boxes) that you've created
rm(x)
Remove the box named x from your list of variables
Control-L
If you press control and l you'll clear the console (commands you've run)
Data manipulation
x<-read.table("http://www.uwyo.edu/crawford/Datasets/printers.txt",header=TRUE)
Read in the data set from the url, and save in a box called x with the header names
x<-read.table("http://www.uwyo.edu/crawford/Datasets/algea.txt",header=TRUE,sep="\t")
Read in the data when it's tab delimited (verses comma delimited)
x<-read.csv("http://www.uwyo.edu/crawford/Datasets/brain.txt",header=TRUE,skip=3)
Read in the data but skip the first 3 lines (of text)
x[2,]
In the data set x, only use row 2
x[,3]
In the data set x, only use column 3
x[2,3]
In the data set x grab the value in row 2 column 3
x[,c(2,3,4,5)]
In the data set x only use columns 2, 3, 4, and 5
head(x)
See just first six rows of the dataset x
nrow(x)
The number of rows in dataset x
round(x,5)
round the number x to 5 decimal places
as.numeric(as.character(dataset$nums))
turn factors (through characters) into numbers
x[x$color=="red",]
In data set x, find only the rows where the color is red
x[x$height>0,]
In data set x, find only the rows where the height is above 0
na.omit(x)
The dataset x except remove any rows that have an "NA" value
x<-rnorm(100,3,2)
Create 100 random numbers that are normal with a mean of 3 and sd 2. Store it in x
Descriptive Statistics
min(x)
Find the minimum value in x
max(x)
Find the maximum value in x
sum(x)
the sum of x
mean(x)
the mean of x
sd(x)
the standard deviation of x
Prerequisite Skills
pnorm(1.96)
Find the area to the left of a z-score of 1.96
qnorm(0.025)
Get the left tailed z-score for an area of 0.025
rnorm(10)
Get 10 random draws from the normal distribution
dnorm(1.96)
The y value of the bellcurve at 1.96 (for plotting)
dpois(5,2)
The probability for the Poisson distribution where theta=2
pt(1.96,2)
pchisq(1.96,2)
pf(1.96,2,5)
Areas for other distributions with their degrees of freedom
t.test(x)
A one sample test of mu=0, also confidence interval for the mean
t.test(x,y)
Two sample test of mu1=mu2, also confidence interval for the difference
t.test(x,y,paired=TRUE)
Two sample matched pairs t-test (with confidence interval)
Plots
boxplot(x)
Make a boxplot of x
hist(x)
Draw a histogram of x
plot(x)
Plot the values of x in order (not actually that useful in this class)
plot(y~x)
Draw a scatterplot of y based on x
plot(y~x,xlim=c(0,100))
Plot y on x, but make the x axis go from 0 to 100
plot(y~x,ylim=c(0,100))
Plot y on x, but make the y axis go from 0 to 100
plot(y~x,col="red")
Plot y on x with red dots
plot(y~x,xlab="Time")
Plot y on x and label the x axis Time
plot(y~x,ylab="Height")
Plot y on x and label the y axis Height
plot(y~x,main="Height based on Time")
Plot y on x and write Height based on Time at the top
lines(y~x)
Add the line for y on x on top of whatever plot is already there
points(y~x)
Add the dots for y on x on top of whatever plot is already there
legend("topright",col=c("red","yellow","blue"),legend=c("high","medium","low"),lty=1)
Put a legend in the top right corner. Have the red line say high, etc.
par(mfrow=c(2,2))
Start putting 4 plots (2 rows, 2 columns) on one picture
x<-seq(0,10,length=1000)
y<-5+2*x
plot(y~x,type="l")
Plot the line y=5+2*x
Simple Regression
fit<-lm(y~x,data=flowers)
fit<-lm(flowers$y~flowers$x)
Predict y based on x, and save the results in a variable (box) called fit
plot(fit)
plot the residuals (4 different plots)
summary(fit)
Get slopes, p-values, R^2, and the standard error
confint(fit)
Computes condidence intervals for one or more parameters in an lm model called fit
predict.lm(fit,newdata=data.frame(x=10),interval="prediction",level=.99)
Give a 99% prediction interval for possible y values where x=10
predict.lm(fit,newdata=data.frame(x=10),interval="confidence",level=.99)
Give a 99% confidence interval for the average/prediction where x=10
AIC(fit)
BIC(fit)
Get the AIC or BIC values for a model
fit<-aov(y~x,data=flowers)
fit<-aov(flowers$y~flowers$x)
Predict y based on x, and save the results in a variable (box) called fit
anova(fit)
Get the sums of squares and F tests for the ANOVA
TukeyHSD(fit)
Perform pairwise confidence intervals with Tukey's EWER correction
Transformed Regression
lm(y~I(x^2),data=flowers)
Predict y based on x squared
lm(y~x+I(x^2)+q+w+q*w,data=flowers)
predict y based on x, x^2, q, w, and their interaction
log(2)
log uses base e
log10(2)
log10 uses base 10
exp(2)
exponent on e
lm(y~log(x),data=flowers)
log does not use the I() notation
Residual Plots
plot(fit,which=1)
Plot the first residual plot (residuals vs fitted values)
plot(fit$residuals~x)
Plot the residuals against x
plot(fit$residuals~fit$fitted.values)
Plot the residuals against the predicted values
qqnorm(fit$residuals)
Plot a QQ plot of the residuals
qqline(fit$residuals)
Put the line on a QQplot
plot(density(fit$residuals))
Draw the density plot of the residuals
plot(cooks.distance(fit))
See Cook's Distance for each data point
Advanced Methods
install.packages("leaps")
Downloading a package of new commands from another server
library(leaps)
Installing the package of new commands into the R you're using
bestfit<-regsubsets(y~x,nbest=1,nvmax=16)
Perform best subset regression on the data
summary(bestfit)$bic
summary(bestfit)$which[2,]
Pick out specific pieces from the best subset regression
plot(TukeyHSD(fit))
See pairwise plots of the Tukey's HSD comparisons
chisq.test(data)
Take a matrix and does an independence chi-squared test
chisq.test(data,probs)
Perform Goodness of Fit on the data
kruskal.wallis(data)
Perfom a Kruskal Wallis test
kruskal.wallis(data,paired=TRUE)
Perfom a Signed Rank test