A Glossary of important R commands

Basic usage

The following table contains important R commands for its basic usage.

Description R Example
Assign values to a variable <- x <- 1
Compute several expressions at once ; x <- 1; 2 + 2; 3 * 8
Create vectors by concatenating numbers c c(1, 2, -1)
Create sequential integer vectors : 1:10
Create a matrix by columns cbind cbind(1:3, c(0, 2, 0))
Create a matrix by rows rbind rbind(1:3, c(0, 2, 0))
Create a data frame data.frame data.frame(name1 = c(-1, 3), name2 = c(0.4, 1))
Create a list list list(obj1 = c(-1, 3), obj2 = -1:5, obj3 = rbind(1:2, 3:2))
Access elements of a…
… vector [] c(0.5, 2)[1], c(0.5, 2)[-1]; c(0.5, 2)[2:1]
… matrix [, ] cbind(1:2, 3:4)[1, 2]; cbind(1:2, 3:4)[1, ]
… data frame [, ] and $ data.frame(name1 = c(-1, 3), name2 = c(0.4, 1))$name1; data.frame(name1 = c(-1, 3), name2 = c(0.4, 1))[2, 1]
… list $ list(x = 2, y = 7:0)$y
Summarize any object summary summary(1:10)

Linear regression

Some useful commands for performing simple and multiple linear regression are given in the next table. We assume that:

  • dataset is an imported dataset such that
    • resp is the response variable
    • pred1 is first predictor
    • pred2 is second predictor
    • predk is the last predictor
  • model is the result of applying lm
  • newPreds is a data.frame with variables named as the predictors
  • num is 1, 2 or 3
  • level is a number between 0 and 1
Description R
Fit a simple linear model lm(response ~ pred1, data = dataset)
Fit a multiple linear model…
… on two predictors lm(response ~ pred1 + pred2, data = dataset)
… on all predictors lm(response ~ ., data = dataset)
… on all predictors except pred1 lm(response ~ . - pred1, data = dataset)
Summarize linear model: coefficient estimates, standard errors, \(t\)-values, \(p\)-values for \(H_0:\beta_j=0\), \(\hat\sigma\) (Residual standard error), degrees of freedom, \(R^2\), Adjusted \(R^2\), \(F\)-test, \(p\)-value for \(H_0:\beta_1=\ldots=\beta_k=0\) summary(model)
ANOVA decomposition anova(model)
CIs coefficients confint(model, level = level)
Prediction predict(model, newdata = new)
CIs predicted mean predict(model, newdata = new, interval = "confidence", level = level)
CIs predicted response predict(model, newdata = new, interval = "prediction", level = level)
Variable selection stepwise(model)
Multicollinearity detection vif(model)
Compare model coefficients compareCoefs(model1, model2)
Diagnostic plots plot(model, num)

More basic usage

The following table contains important R commands for its basic usage. We assume the following dataset is available:

data <- data.frame(x = 1:10, y = c(-1, 2, 3, 0, 3, 1, -1, 3, 0, -1))
Description R Example
Data frame management
variable names names names(data)
structure str str(data)
dimensions dim dim(data)
beginning head head(data)
Vector related functions
create sequences seq seq(0, 1, l = 10); seq(0, 1, by = 0.25)
reverse a vector rev rev(1:5)
length of a vectors length length(1:5)
count repetitions in a vector table table(c(1:5, 4:2))
Logical conditions
relational operators <, <=, >, >=, ==, != 1 < 0; 1 <= 1; 2 > 1; 3 >= 4; 1 == 0; 1 != 0
“and” & TRUE & FALSE
“or” | TRUE | FALSE
Subsetting
vector data$x[data$x > 0]; data$x[data$x > 2 & data$x < 8]
data frame data[data$x > 0, ]; data[data$x < 2 | data$x > 8, ]
Distributions
sampling rxxxx rnorm(n = 10, mean = 0, sd = 1)
density dxxxx x <- seq(-4, 4, l = 20); dnorm(x = x, mean = 0, sd = 1)
distribution pxxxx x <- seq(-4, 4, l = 20); pnorm(q = x, mean = 0, sd = 1)
quantiles qxxxx p <- seq(0.1, 0.9, l = 10); qnorm(p = p, mean = 0, sd = 1)
Plotting
scatterplot plot plot(rnorm(100), rnorm(100))
plot a curve plot, seq x <- seq(0, 1, l = 100); plot(x, x^2, type = "l")
add lines lines, x <- seq(0, 1, l = 100); plot(x, x^2 + rnorm(100, sd = 0.1)); lines(x, x^2, col = 2, lwd = 2)

Logistic regression

Some useful commands for performing logistic regression are given in the next table. We assume that:

  • dataset is an imported dataset such that
    • resp is the response binary variable
    • pred1 is first predictor
    • pred2 is second predictor
    • predk is the last predictor
  • model is the result of applying glm
  • newPreds is a data.frame with variables named as the predictors
  • level is a number between 0 and 1
Description R
Fit a simple logistic model glm(response ~ pred1, data = dataset, family = "binomial")
Fit a multiple logistic model…
… on two predictors glm(response ~ pred1 + pred2, data = dataset, family = "binomial")
… on all predictors glm(response ~ ., data = dataset, family = "binomial")
… on all predictors except pred1 glm(response ~ . - pred1, data = dataset, family = "binomial")
Summarize logistic model: coefficient estimates, standard errors, Wald statistics ('z value'), \(p\)-values for \(H_0:\beta_j=0\), Null deviance, deviance ('Residual deviance'), AIC, number of iterations summary(model)
CIs coefficients confint(model, level = level); confint.default(model, level = level)
CIs exp-coefficients exp(confint(model, level = level)); exp(confint.default(model, level = level))
Prediction predict(model, newdata = new, type = "response")
CIs predicted probability Not immediate. Use predictCIsLogistic(model, newdata = new, level = level) as seen in Section 4.6
Variable selection stepwise(model)
Multicollinearity detection vif(model)
\(R^2\) Not immediate. Use r2Log(model = model) as seen in Section 4.8
Hit matrix table(data$resp, model$fitted.values > 0.5)

Principal component analysis

Some useful commands for performing logistic regression are given in the next table. We assume that:

  • dataset is an imported dataset with several non-categorical variables (the variables must be continuous or discrete).
  • pca is a PCA object, this is, the output of princomp.
Description R
Compute a PCA…
… unnormalized (if variables have the same scale) princomp(dataset)
… normalized (if variables have different scales) princomp(dataset, cor = TRUE)
Summarize PCA: standard deviation explained by each PC, proportion of variance explained by each PC, cumulative proportion of variance explained up to a given component summary(pca)
Weights pca$loadings
Scores pca$scores
Standard deviations of the PCs pca$sdev
Means of the original variables pca$center
Screeplot plot(pca); plot(pca, type = "l")
Biplot biplot(pca)