1 Setup

1.1 Load required packages

library(ggplot2)
library(scales)

1.2 Load data

fileURL <- "http://lib.stat.cmu.edu/datasets/boston_corrected.txt"

boston <- read.delim(
  file = fileURL,
  sep="\t", 
  skip = 9
)

2 Clean the data

2.1 Removing a few useless columns from the data

Boston <- subset(boston, select = -c(OBS., TOWN, TOWN., TRACT, LON, LAT, MEDV))

2.2 Transforming all variable names to lower cases using the tolower() method.

colnames(Boston) <- tolower(colnames(Boston))

2.3 Change the name of the variable “cmedv” to “medv”

if ('cmedv' %in% colnames(Boston)) {
  names(Boston)[names(Boston) == 'cmedv'] <- 'medv'
}

2.4 Attach the data frame to the R search path

By attaching a data frame to the search path it is possible to refer to the column variables in the data frame by their names alone, rather than as components of the data frame (e.g., value rather than house$value).

attach(Boston)

3 Fun stuff

3.1 Building the linear model

For our linear model \(f(x) = \alpha + \beta{y}\), we will define our predictor variable \(x\) and response variable \(y\) as: \[ \begin{aligned} x &= medv \\ y &= lstat \end{aligned} \] The syntax for building the linear model is written as follows:

lm.fit <- lm(formula = MEDV ~ LSTAT, data = boston)

3.2 Plotting the data set

3.2.1 Red line

plot(x = medv,y = lstat)
abline(lm.fit, col="red")

3.2.2 Blue line

plot(x = medv,y = lstat)
abline(lm.fit, col="blue")

3.2.3 Green line

plot(x = medv,y = lstat)
abline(lm.fit, col="green")

3.3 doing whatever more

4 More funny stuff

4.1 Hurray

TODO:
#4. Perhaps plot the dataset
#5. Add the fitted line
#5a. Enabling to show more than one visual
#6. Assess the assumptions