Example 2
Context
The admissions committee of a comprehensive state university selected at random the records of 200 second-semester freshmen. The results, first-semester college GPA and SAT scores, are stored in the data frame GRADES.
Read in the data using GRADES
is part of this package:
library(PASWR2)
Use the GRADES
data set and model gpa as a function of sat assuming that the requirements for model are satisfied.
Question 1)
Compute the expected GPA (gpa) for an SAT score (sat) of 1300.
Question 2)
Construct a 90% confidence interval for the mean GPA for students scoring 1300 on the SAT.
Question 3)
Find the prediction limits on GPA for a future student who scores 1300 on the SAT.
The below code creates a linear model object of gpa against sat.
<- lm(gpa ~ sat, data = GRADES)
mod.lm <- coef(mod.lm)
betahat betahat
## (Intercept) sat
## -1.19206381 0.00309427
Question 1)
The expected GPA for an SAT score of 1300 is \(\hat Y_h\) = \(X_h \cdot \hat \beta = 2.8305\), where \(X_h\) =(1, 1300) and \(\hat \beta\) = [−1.1921, 0.0031]'
Xh <- matrix(c(1, 1300), nrow = 1)
Yhath <- Xh%*%betahat
Yhath
<- matrix(c(1, 1300), nrow = 1)
Xh <- Xh%*%betahat
Yhath Yhath
## [,1]
## [1,] 2.830488
A methods that requires less typing is
predict(mod.lm, newdata=data.frame(sat = 1300))
## 1
## 2.830488
Question 2)
A 90% confidence interval for the mean gpa for students scoring 1300 on the SAT using (12.81) is \(CI_0.90\) (E(\(Y_h\)))= [2.7598, 2.9012]. R Code below computes \(s^2_{\hat Y_h}\) using (12.80), and the confidence interval using (12.81).
<- anova(mod.lm)[2, 3]
MSE MSE
## [1] 0.1595551
<- summary(mod.lm)$cov.unscaled
XTXI XTXI
## (Intercept) sat
## (Intercept) 0.310137964 -2.689270e-04
## sat -0.000268927 2.370131e-07
<- MSE*XTXI
var.cov.b var.cov.b
## (Intercept) sat
## (Intercept) 4.948408e-02 -4.290866e-05
## sat -4.290866e-05 3.781665e-08
<- Xh%*%var.cov.b%*%t(Xh)
s2yhath s2yhath
## [,1]
## [1,] 0.001831706
<- sqrt(s2yhath)
syhath syhath
## [,1]
## [1,] 0.04279843
<- qt(0.95, 198)
crit.t <- Yhath + c(-1, 1)*crit.t*syhath
CI.EYh CI.EYh
## [1] 2.759760 2.901216
The function predict() may also be used to compute the requested interval.
predict(mod.lm, newdata = data.frame(sat = 1300),
interval="conf", level = 0.90)
## fit lwr upr
## 1 2.830488 2.75976 2.901216
Question 3)
The prediction limits on GPA for a future student who scores 1300 on the SAT are PI 0.90 = [2.1666, 3.4944] using (12.83).
<- MSE + s2yhath
s2yhathnew <- sqrt(s2yhathnew)
syhathnew syhathnew
## [,1]
## [1,] 0.4017297
<- Yhath + c(-1, 1)*crit.t*syhathnew
PI PI
## [1] 2.166595 3.494380
Using the predict() function with the argument interval = "pred" also returns the requested prediction limits.
<- predict(mod.lm, newdata = data.frame(sat = 1300),
PI interval = "pred", level = 0.90)
PI
## fit lwr upr
## 1 2.830488 2.166595 3.49438
Therefore we conclude that in a sample of 200 second-semester freshmen for which the SAT scores is 2.8 and it is highly likely that the GPA would lie, on average, somewhere between 2.89 and 2.97.