4.10 Exercises

  1. True or false? Simple linear regression describes the setting where there is more than one predictor.

  2. True or false? When the predictor is continuous, a simple linear regression estimates the best fitting line.

  3. True or false? When the predictor is categorical, a simple linear regression estimates the mean outcome at each level of the predictor.

  4. What is the interpretation of the intercept \(\beta_0\) in a simple linear regression with an uncentered continuous predictor? With a centered continuous predictor? With a categorical predictor?

  5. What is the interpretation of the slope \(\beta_1\) in a simple linear regression with a continuous predictor? Does centering affect this interpretation?

  6. In a simple linear regression with a categorical predictor with \(L\) levels, what is the interpretation of the coefficients \(\beta_1, ..., \beta_{L-1}\)?

  7. For a simple linear regression with a continuous predictor, write out the null and alternative hypothesis that are tested by the p-value for that predictor. Express the null and alternative hypotheses as statements about the slope of the regression line.

  8. What quantities should always be reported along with the p-value for a regression coefficient?

  9. Suppose you have large sample size and a small p-value. Does this imply you have a meaningfully large regression coefficient? Why or why not?

  10. Suppose you have small sample size and a large p-value. Does this imply you have a negligibly small regression coefficient? Why or why not?

For Exercises 11-18, use the Digitalis teaching dataset (dig_rmph.rData, see Appendix A.6) to answer questions related to the following research question: Is there a significant association between the predictor body mass index (BMI) and the outcome heart rate (HEARTRTE)? Fit the appropriate regression model and then answer the questions.

  1. What kind of predictor is BMI, continuous or categorical?

  2. Visualize the relationship (plot the outcome vs. the predictor with the regression line included).

  3. When you fit the regression model, how many observations were removed due to missing values?

  4. What proportion of the variation in heart rate is explained by BMI?

  5. Provide the effect estimate (regression slope), its 95% CI, p-value, and a statement regarding whether the association is statistically significant.

  6. Example 4.1 concluded “On average, for every 1-cm difference in waist circumference, adults differ in mean fasting glucose by 0.0278 mmol/L”. Provide the corresponding interpretation of the regression slope in the regression of heart rate on BMI. Be careful to not imply causality, as this is cross-sectional data.

  7. What is the predicted mean heart rate (and 95% CI) for those with a BMI of 20 kg/m2? For those with a BMI of 35 kg/m2?

  8. Among those with a BMI of 20 kg/m2, within what interval do we expect 95% of heart rate values to fall? Among those with a BMI of 35 kg/m2?

For Exercises 19-23, use the Digitalis teaching dataset (dig_rmph.rData, see Appendix A.6) to answer questions related to the following research question: Does heart rate (HEARTRTE) differ significantly between treatment group (TRTMT = Placebo, Digoxin)? Fit the appropriate regression model and then answer the questions.

  1. Visualize the relationship (plot the outcome vs. the predictor, along with the mean at each level and a line connecting the means).

  2. Provide the p-value for the test of this association and state whether the association is statistically significant.

  3. What is the mean and 95% CI for the difference in heart rate between those in the Digoxin group and those in the Placebo group? Is this difference statistically significant?

  4. What is the estimated mean heart rate (and 95% CI) for each of the treatment groups?

  5. For each treatment group, within what interval do we expect 95% of heart rates to fall?

For Exercises 24-25, use the UN Human Development Data and the methods in this chapter to answer each research question (unhdd2020.rmph.rData, see Appendix A.2).

  1. What is the association between the outcome life expectancy at birth (years) (life) and expected years of schooling (years) (educ_expected)?

  2. How does the Gender Inequality Index (gii) (GII) differ between Human Development Index (HDI) groups (hdi_group)?

For Exercises 26-27, use the CAMP teaching dataset camp_0_48_rmph.rData (see Appendix A.6).

  1. Does post-bronchodilator Forced Expiratory Volume at 1 second (FEV1) (POSFEV0) differ significantly between treatment groups (Budesonide, Nedocromil, Placebo) (TG)?

  2. Following up on the previous Exercise, estimate the two treatment effects (Budesonide vs. Placebo, Nedocromil vs. Placebo) along with their 95% confidence intervals and p-values.