Example 2

Context

The admissions committee of a comprehensive state university selected at random the records of 200 second-semester freshmen. The results, first-semester college GPA and SAT scores, are stored in the data frame GRADES.

Read in the data using GRADES is part of this package:

library(PASWR2)

Use the GRADES data set and model gpa as a function of sat assuming that the requirements for model are satisfied.


Questions

Question 1)

Compute the expected GPA (gpa) for an SAT score (sat) of 1300.

Question 2)

Construct a 90% confidence interval for the mean GPA for students scoring 1300 on the SAT.

Question 3)

Find the prediction limits on GPA for a future student who scores 1300 on the SAT.


Solutions

The below code creates a linear model object of gpa against sat.

mod.lm <- lm(gpa ~ sat, data = GRADES)
betahat <- coef(mod.lm)
betahat
## (Intercept)         sat 
## -1.19206381  0.00309427

Question 1)

The expected GPA for an SAT score of 1300 is \(\hat Y_h\) = \(X_h \cdot \hat \beta = 2.8305\), where \(X_h\) =(1, 1300) and \(\hat \beta\) = [−1.1921, 0.0031]'

Xh <- matrix(c(1, 1300), nrow = 1) Yhath <- Xh%*%betahat Yhath

Xh <- matrix(c(1, 1300), nrow = 1)
Yhath <- Xh%*%betahat
Yhath
##          [,1]
## [1,] 2.830488

A methods that requires less typing is

predict(mod.lm, newdata=data.frame(sat = 1300))
##        1 
## 2.830488

Question 2)

A 90% confidence interval for the mean gpa for students scoring 1300 on the SAT using (12.81) is \(CI_0.90\) (E(\(Y_h\)))= [2.7598, 2.9012]. R Code below computes \(s^2_{\hat Y_h}\) using (12.80), and the confidence interval using (12.81).

MSE <- anova(mod.lm)[2, 3]
MSE
## [1] 0.1595551
XTXI <- summary(mod.lm)$cov.unscaled
XTXI
##              (Intercept)           sat
## (Intercept)  0.310137964 -2.689270e-04
## sat         -0.000268927  2.370131e-07
var.cov.b <- MSE*XTXI
var.cov.b
##               (Intercept)           sat
## (Intercept)  4.948408e-02 -4.290866e-05
## sat         -4.290866e-05  3.781665e-08
s2yhath <- Xh%*%var.cov.b%*%t(Xh)
s2yhath
##             [,1]
## [1,] 0.001831706
syhath <- sqrt(s2yhath)
syhath
##            [,1]
## [1,] 0.04279843
crit.t <- qt(0.95, 198)
CI.EYh <- Yhath + c(-1, 1)*crit.t*syhath
CI.EYh
## [1] 2.759760 2.901216

The function predict() may also be used to compute the requested interval.

predict(mod.lm, newdata = data.frame(sat = 1300),
 interval="conf", level = 0.90)
##        fit     lwr      upr
## 1 2.830488 2.75976 2.901216

Question 3)

The prediction limits on GPA for a future student who scores 1300 on the SAT are PI 0.90 = [2.1666, 3.4944] using (12.83).

s2yhathnew <- MSE + s2yhath
syhathnew <- sqrt(s2yhathnew)
syhathnew
##           [,1]
## [1,] 0.4017297
PI <- Yhath + c(-1, 1)*crit.t*syhathnew
PI
## [1] 2.166595 3.494380

Using the predict() function with the argument interval = "pred" also returns the requested prediction limits.

PI <- predict(mod.lm, newdata = data.frame(sat = 1300),
 interval = "pred", level = 0.90)
PI
##        fit      lwr     upr
## 1 2.830488 2.166595 3.49438

Therefore we conclude that in a sample of 200 second-semester freshmen for which the SAT scores is 2.8 and it is highly likely that the GPA would lie, on average, somewhere between 2.89 and 2.97.