3.5 Prediction
As in the simple linear model, the forecast of \(Y\) from \(\mathbf{X}=\mathbf{x}\) (this is, \(X_1=x_1,\ldots,X_k=x_k\)) is approached by two different ways:
- Inference on the conditional mean of \(Y\) given \(\mathbf{X}=\mathbf{x}\), \(\mathbb{E}[Y|\mathbf{X}=\mathbf{x}]\). This is a deterministic quantity, which equals \(\beta_0+\beta_1x_1+\cdots+\beta_{k}x_k\).
- Prediction of the conditional response \(Y|\mathbf{X}=\mathbf{x}\). This is a random variable distributed as \(\mathcal{N}(\beta_0+\beta_1x_1+\cdots+\beta_{k}x_k,\sigma^2)\).
The prediction and computation of CIs can be done with the R function predict
(unfortunately, there is no R Commander shortcut for this one). The objects required for predict
are: first, the output of lm
; second, a data.frame
containing the locations \(\mathbf{x}=(x_1,\ldots,x_k)\) where we want to predict \(\beta_0+\beta_1x_1+\cdots+\beta_{k}x_k\). The prediction is \(\hat\beta_0+\hat\beta_1x_1+\cdots+\hat\beta_{k}x_k\)
It is mandatory to name the columns of the data frame with the same names of the predictors used in lm
. Otherwise predict
will generate an error, see below.
To illustrate the use of predict
, we return to the wine
dataset.
# Fit a linear model for the price on WinterRain, HarvestRain and AGST
<- lm(Price ~ WinterRain + HarvestRain + AGST, data = wine)
modelW summary(modelW)
##
## Call:
## lm(formula = Price ~ WinterRain + HarvestRain + AGST, data = wine)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.62816 -0.17923 0.02274 0.21990 0.62859
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4.9506001 1.9694011 -2.514 0.01940 *
## WinterRain 0.0012820 0.0005765 2.224 0.03628 *
## HarvestRain -0.0036242 0.0009646 -3.757 0.00103 **
## AGST 0.7123192 0.1087676 6.549 1.11e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3436 on 23 degrees of freedom
## Multiple R-squared: 0.7407, Adjusted R-squared: 0.7069
## F-statistic: 21.9 on 3 and 23 DF, p-value: 6.246e-07
# Data for which we want a prediction
# Important! You have to name the column with the predictor name!
<- data.frame(WinterRain = 500, HarvestRain = 123,
weather AGST = 18)
## Prediction of the mean
# Prediction of the mean at 95% - the defaults
predict(modelW, newdata = weather)
## 1
## 8.066342
# Prediction of the mean with 95% confidence interval (the default)
# CI: (lwr, upr)
predict(modelW, newdata = weather, interval = "confidence")
## fit lwr upr
## 1 8.066342 7.714178 8.418507
predict(modelW, newdata = weather, interval = "confidence", level = 0.95)
## fit lwr upr
## 1 8.066342 7.714178 8.418507
# Other levels
predict(modelW, newdata = weather, interval = "confidence", level = 0.90)
## fit lwr upr
## 1 8.066342 7.774576 8.358108
predict(modelW, newdata = weather, interval = "confidence", level = 0.99)
## fit lwr upr
## 1 8.066342 7.588427 8.544258
## Prediction of the response
# Prediction of the mean at 95% - the defaults
predict(modelW, newdata = weather)
## 1
## 8.066342
# Prediction of the response with 95% confidence interval (the default)
# CI: (lwr, upr)
predict(modelW, newdata = weather, interval = "prediction")
## fit lwr upr
## 1 8.066342 7.273176 8.859508
predict(modelW, newdata = weather, interval = "prediction", level = 0.95)
## fit lwr upr
## 1 8.066342 7.273176 8.859508
# Other levels
predict(modelW, newdata = weather, interval = "prediction", level = 0.90)
## fit lwr upr
## 1 8.066342 7.409208 8.723476
predict(modelW, newdata = weather, interval = "prediction", level = 0.99)
## fit lwr upr
## 1 8.066342 6.989951 9.142733
# Predictions for several values
<- data.frame(WinterRain = c(500, 200), HarvestRain = c(123, 200),
weather2 AGST = c(17, 18))
predict(modelW, newdata = weather2, interval = "prediction")
## fit lwr upr
## 1 7.354023 6.613835 8.094211
## 2 7.402691 6.533945 8.271437
For the wine
dataset, do the following:
- Regress
WinterRain
onHarvestRain
andAGST
. Name the fitted modelmodExercise
. - Compute the estimate for the conditional mean of
WinterRain
forHarvestRain
\(=123.0\) andAGST
\(=16.15\). What is the CI at \(\alpha=0.01\)? - Compute the estimate for the conditional response for
HarvestRain
\(=125.0\) andAGST
\(=15\). What is the CI at \(\alpha=0.10\)? - Check that
modExercise$fitted.values
is the same aspredict(modExercise, newdata = data.frame(HarvestRain = wine$HarvestRain, AGST = wine$AGST))
. Why is so?
Similarities and differences in the prediction of the conditional mean \(\mathbb{E}[Y|\mathbf{X}=\mathbf{x}]\) and conditional response \(Y|\mathbf{X}=\mathbf{x}\):
- Similarities. The estimate is the same, \(\hat y=\hat\beta_0+\hat\beta_1x_1+\cdots+\hat\beta_kx_k\). Both CI are centered in \(\hat y\).
- Differences. \(\mathbb{E}[Y|\mathbf{X}=\mathbf{x}]\) is deterministic and \(Y|\mathbf{X}=\mathbf{x}\) is random. Therefore, the variance is larger for the prediction of \(Y|\mathbf{X}=\mathbf{x}\) than for the prediction of \(\mathbb{E}[Y|\mathbf{X}=\mathbf{x}]\).