A Glossary of important R commands
Basic usage
The following table contains important R commands for its basic usage.
Description | R | Example |
---|---|---|
Assign values to a variable | <- |
x <- 1 |
Compute several expressions at once | ; |
x <- 1; 2 + 2; 3 * 8 |
Create vectors by concatenating numbers | c |
c(1, 2, -1) |
Create sequential integer vectors | : |
1:10 |
Create a matrix by columns | cbind |
cbind(1:3, c(0, 2, 0)) |
Create a matrix by rows | rbind |
rbind(1:3, c(0, 2, 0)) |
Create a data frame | data.frame |
data.frame(name1 = c(-1, 3), name2 = c(0.4, 1)) |
Create a list | list |
list(obj1 = c(-1, 3), obj2 = -1:5, obj3 = rbind(1:2, 3:2)) |
Access elements of a… | ||
… vector | [] |
c(0.5, 2)[1], c(0.5, 2)[-1]; c(0.5, 2)[2:1] |
… matrix | [, ] |
cbind(1:2, 3:4)[1, 2]; cbind(1:2, 3:4)[1, ] |
… data frame | [, ] and $ |
data.frame(name1 = c(-1, 3), name2 = c(0.4, 1))$name1; data.frame(name1 = c(-1, 3), name2 = c(0.4, 1))[2, 1] |
… list | $ |
list(x = 2, y = 7:0)$y |
Summarize any object | summary |
summary(1:10) |
Linear regression
Some useful commands for performing simple and multiple linear regression are given in the next table. We assume that:
dataset
is an imported dataset such thatresp
is the response variablepred1
is first predictorpred2
is second predictor- …
predk
is the last predictor
model
is the result of applyinglm
newPreds
is adata.frame
with variables named as the predictorsnum
is1
,2
or3
level
is a number between 0 and 1
Description | R |
---|---|
Fit a simple linear model | lm(response ~ pred1, data = dataset) |
Fit a multiple linear model… | |
… on two predictors | lm(response ~ pred1 + pred2, data = dataset) |
… on all predictors | lm(response ~ ., data = dataset) |
… on all predictors except pred1 |
lm(response ~ . - pred1, data = dataset) |
Summarize linear model: coefficient estimates, standard errors, \(t\)-values, \(p\)-values for \(H_0:\beta_j=0\), \(\hat\sigma\) (Residual standard error), degrees of freedom, \(R^2\), Adjusted \(R^2\), \(F\)-test, \(p\)-value for \(H_0:\beta_1=\ldots=\beta_k=0\) | summary(model) |
ANOVA decomposition | anova(model) |
CIs coefficients | confint(model, level = level) |
Prediction | predict(model, newdata = new) |
CIs predicted mean | predict(model, newdata = new, interval = "confidence", level = level) |
CIs predicted response | predict(model, newdata = new, interval = "prediction", level = level) |
Variable selection | stepwise(model) |
Multicollinearity detection | vif(model) |
Compare model coefficients | compareCoefs(model1, model2) |
Diagnostic plots | plot(model, num) |
More basic usage
The following table contains important R commands for its basic usage. We assume the following dataset is available:
<- data.frame(x = 1:10, y = c(-1, 2, 3, 0, 3, 1, -1, 3, 0, -1)) data
Description | R | Example |
---|---|---|
Data frame management | ||
variable names | names |
names(data) |
structure | str |
str(data) |
dimensions | dim |
dim(data) |
beginning | head |
head(data) |
Vector related functions | ||
create sequences | seq |
seq(0, 1, l = 10); seq(0, 1, by = 0.25) |
reverse a vector | rev |
rev(1:5) |
length of a vectors | length |
length(1:5) |
count repetitions in a vector | table |
table(c(1:5, 4:2)) |
Logical conditions | ||
relational operators | < , <= , > , >= , == , != |
1 < 0; 1 <= 1; 2 > 1; 3 >= 4; 1 == 0; 1 != 0 |
“and” | & |
TRUE & FALSE |
“or” | | |
TRUE | FALSE |
Subsetting | ||
vector | data$x[data$x > 0]; data$x[data$x > 2 & data$x < 8] |
|
data frame | data[data$x > 0, ]; data[data$x < 2 | data$x > 8, ] |
|
Distributions | ||
sampling | rxxxx |
rnorm(n = 10, mean = 0, sd = 1) |
density | dxxxx |
x <- seq(-4, 4, l = 20); dnorm(x = x, mean = 0, sd = 1) |
distribution | pxxxx |
x <- seq(-4, 4, l = 20); pnorm(q = x, mean = 0, sd = 1) |
quantiles | qxxxx |
p <- seq(0.1, 0.9, l = 10); qnorm(p = p, mean = 0, sd = 1) |
Plotting | ||
scatterplot | plot |
plot(rnorm(100), rnorm(100)) |
plot a curve | plot , seq |
x <- seq(0, 1, l = 100); plot(x, x^2, type = "l") |
add lines | lines , |
x <- seq(0, 1, l = 100); plot(x, x^2 + rnorm(100, sd = 0.1)); lines(x, x^2, col = 2, lwd = 2) |
Logistic regression
Some useful commands for performing logistic regression are given in the next table. We assume that:
dataset
is an imported dataset such thatresp
is the response binary variablepred1
is first predictorpred2
is second predictor- …
predk
is the last predictor
model
is the result of applyingglm
newPreds
is adata.frame
with variables named as the predictorslevel
is a number between 0 and 1
Description | R |
---|---|
Fit a simple logistic model | glm(response ~ pred1, data = dataset, family = "binomial") |
Fit a multiple logistic model… | |
… on two predictors | glm(response ~ pred1 + pred2, data = dataset, family = "binomial") |
… on all predictors | glm(response ~ ., data = dataset, family = "binomial") |
… on all predictors except pred1 |
glm(response ~ . - pred1, data = dataset, family = "binomial") |
Summarize logistic model: coefficient estimates, standard errors, Wald statistics ('z value' ), \(p\)-values for \(H_0:\beta_j=0\), Null deviance, deviance ('Residual deviance' ), AIC, number of iterations |
summary(model) |
CIs coefficients | confint(model, level = level); confint.default(model, level = level) |
CIs exp-coefficients | exp(confint(model, level = level)); exp(confint.default(model, level = level)) |
Prediction | predict(model, newdata = new, type = "response") |
CIs predicted probability | Not immediate. Use predictCIsLogistic(model, newdata = new, level = level) as seen in Section 4.6 |
Variable selection | stepwise(model) |
Multicollinearity detection | vif(model) |
\(R^2\) | Not immediate. Use r2Log(model = model) as seen in Section 4.8 |
Hit matrix | table(data$resp, model$fitted.values > 0.5) |
Principal component analysis
Some useful commands for performing logistic regression are given in the next table. We assume that:
dataset
is an imported dataset with several non-categorical variables (the variables must be continuous or discrete).pca
is a PCA object, this is, the output ofprincomp
.
Description | R |
---|---|
Compute a PCA… | |
… unnormalized (if variables have the same scale) | princomp(dataset) |
… normalized (if variables have different scales) | princomp(dataset, cor = TRUE) |
Summarize PCA: standard deviation explained by each PC, proportion of variance explained by each PC, cumulative proportion of variance explained up to a given component | summary(pca) |
Weights | pca$loadings |
Scores | pca$scores |
Standard deviations of the PCs | pca$sdev |
Means of the original variables | pca$center |
Screeplot | plot(pca); plot(pca, type = "l") |
Biplot | biplot(pca) |