library(ggplot2)
library(scales)
fileURL <- "http://lib.stat.cmu.edu/datasets/boston_corrected.txt"
boston <- read.delim(
file = fileURL,
sep="\t",
skip = 9
)
Boston <- subset(boston, select = -c(OBS., TOWN, TOWN., TRACT, LON, LAT, MEDV))
tolower()
method.colnames(Boston) <- tolower(colnames(Boston))
if ('cmedv' %in% colnames(Boston)) {
names(Boston)[names(Boston) == 'cmedv'] <- 'medv'
}
By attaching a data frame to the search path it is possible to refer to the column variables in the data frame by their names alone, rather than as components of the data frame (e.g., value
rather than house$value
).
attach(Boston)
For our linear model \(f(x) = \alpha + \beta{y}\), we will define our predictor variable \(x\) and response variable \(y\) as: \[ \begin{aligned} x &= medv \\ y &= lstat \end{aligned} \] The syntax for building the linear model is written as follows:
lm.fit <- lm(formula = MEDV ~ LSTAT, data = boston)
plot(x = medv,y = lstat)
abline(lm.fit, col="red")
plot(x = medv,y = lstat)
abline(lm.fit, col="blue")
plot(x = medv,y = lstat)
abline(lm.fit, col="green")
TODO:
#4. Perhaps plot the dataset
#5. Add the fitted line
#5a. Enabling to show more than one visual
#6. Assess the assumptions