R commands for Intermediary Statistics
Basic Stuff
Control-R
Take a command from the script and run it (for windows)
Command-Enter
Take a command from the script and run it (for MAC)
?plot
Read the help file on plot
??plot
Search for all commands that have plot in the description
#Remember to fix this
A comment that won't be run if you do control-r
x <- 2
Assign the value of 2 to the box named x
x <- c(2,4,3,5,7)
Assign the vector of numbers 2 4 3 5 and 7 to the box named x
x <- "hello"
Assign the characters hello to the box named x
ls()
See all the variables (boxes) that you've created
rm(x)
Remove the box named x from your list of variables
Control-L
If you press control and l you'll clear the console (commands you've run)
Data manipulation
x<-read.table("http://www.uwyo.edu/crawford/Datasets/printers.txt",header=TRUE)
Read in the data set from the url, and save in a box called x with the header names
x<-read.table("http://www.uwyo.edu/crawford/Datasets/algea.txt",header=TRUE,sep="\t")
Read in the data when it's tab delimited (verses comma delimited)
x<-read.csv("http://www.uwyo.edu/crawford/Datasets/brain.txt",header=TRUE,skip=3)
Read in the data but skip the first 3 lines (of text)
x[2,]
In the data set x, only use row 2
x[,3]
In the data set x, only use column 3
x[2,3]
In the data set x grab the value in row 2 column 3
x[,c(2,3,4,5)]
In the data set x only use columns 2, 3, 4, and 5
head(x)
See just first six rows of the dataset x
nrow(x)
The number of rows in dataset x
round(x,5)
round the number x to 5 decimal places
as.numeric(as.character(dataset$nums))
turn factors (through characters) into numbers
x[x$color=="red",]
In data set x, find only the rows where the color is red
x[x$height>0,]
In data set x, find only the rows where the height is above 0
na.omit(x)
The dataset x except remove any rows that have an "NA" value
x<-rnorm(100,3,2)
Create 100 random numbers that are normal with a mean of 3 and sd 2. Store it in x
Descriptive Statistics
min(x)
Find the minimum value in x
max(x)
Find the maximum value in x
sum(x)
the sum of x
mean(x)
the mean of x
sd(x)
the standard deviation of x
t.test(x)
A one sample test of mu=0, also confidence interval for the mean
t.test(x,y)
Two sample test of mu1=mu2, also confidence interval for the difference
t.test(x,y,paired=TRUE)
Two sample matched pairs t-test (with confidence interval)
Plots
boxplot(x)
Make a boxplot of x
hist(x)
Draw a histogram of x
plot(x)
Plot the values of x in order (not actually that useful in this class)
plot(y~x)
Draw a scatterplot of y based on x
plot(y~x,xlim=c(0,100))
Plot y on x, but make the x axis go from 0 to 100
plot(y~x,ylim=c(0,100))
Plot y on x, but make the y axis go from 0 to 100
plot(y~x,col="red")
Plot y on x with red dots
plot(y~x,xlab="Time")
Plot y on x and label the x axis Time
plot(y~x,ylab="Height")
Plot y on x and label the y axis Height
plot(y~x,main="Height based on Time")
Plot y on x and write Height based on Time at the top
lines(y~x)
Add the line for y on x on top of whatever plot is already there
points(y~x)
Add the dots for y on x on top of whatever plot is already there
legend("topright",col=c("red","yellow","blue"),legend=c("high","medium","low"),lty=1)
Put a legend in the top right corner. Have the red line say high, etc.
par(mfrow=c(2,2))
Start putting 4 plots (2 rows, 2 columns) on one picture
x<-seq(0,10,length=1000)
y<-5+2*x
plot(y~x,type="l")
Plot the line y=5+2*x
Regression
fit<-lm(y~x,data=flowers)
fit<-lm(flowers$y~flowers$x)
Predict y based on x, and save the results in a variable (box) called fit
plot(fit)
plot the residuals (4 different plots)
summary(fit)
Get slopes, p-values, R^2, and the standard error
confint(fit)
Computes condidence intervals for one or more parameters in an lm model called fit
More advanced Stuff
lm(y~I(x^2),data=flowers)
Predict y based on x squared
lm(y~x+I(x^2)+q+w+q*w,data=flowers)
predict y based on x, x^2, q, w, and their interaction
plot(fit$residuals~x)
Plot the residuals against x
predict.lm(fit,newdata=data.frame(x=10,q=2,w=5))
Used to make predictions
log(2)
log uses base e
log10(2)
log10 uses base 10
exp(2)
exponent on e
lm(y~log(x),data=flowers)
log does not use the I() notation