Chapter 4 Social Policy Analysis III

Problem 1: How may going on a diet affect the BMI of people?

Difference-in-Differences can be used to see the causal effect of diet on BMI.
If we want to know the unit-level causal effect of of consuming calories on BMI, i.e., instead of just knowing whether consuming fewer calories results in less BMI, we might be interested in how much a single calorie consumed is actually associated with BMI, we can use fixed-effects model. See the following code:

plm(BMI ~ Diet, data=long.df, index=c("ID", "Year"), model="within")

Question:

Suppose we are interested in how exercising in a gym affects a person’s self-esteem. We try to gather a random sample of individuals at multiple time points, and examine whether individuals who increased their frequency of exercise in a gym had higher self-esteem. However, when assessing the data, we realize that we oversampled white men, who are known to exercise at gyms at high rates, and who tend to have relatively high self-esteem. If we estimate a fixed effects model with this data, will the fact that we oversampled white men confound our results?

Answer:

Correct! Our oversampling of men would clearly confound our estimates in a typical OLS regression model, but since we are only looking at within-person changes, the fact that our sample is biased towards a group of men who tend to exercise and have high self-esteem should not confound our results. As long as the effect of exercise on self-esteem is equal among white men and all other people, and that White men change their rate of exercise as often as the rest of the population, our fixed effects models with this data should accurately reflect the effect of exercise on self-esteem among all people. What is most important is how the rate of exercise among our sample changed over time. If few individuals in our sample increased or decreased their rate of exercise, or if only particular groups tended to change their rates of exercise, our fixed effects models would be biased and have limited statistical power. Individuals whose behavior does not change between waves cannot add to our understanding about how a change in that behavior effects changes in their outcomes.

Problem 2:

Imagine that you are a principal of a high-school in Guelph, Ontario. Each student in your school is using gel pens to write their paragraphs in the literature class. The librarian of the school proposed that students can write faster if they are require to use ballpoint pens.

Data:

Suppose that at the end of each school year, students are required to write a three paragraph answer to a short question. When students submit their exams, teachers note the following information:

How long the students took to submit their exams,
How many words were written in their exams,
Note how many hours of sleep the students had slept the previous night,
And note whether the exams were written with a gel or ballpoint pen.

These style exams are given to students three years in a row. The librarian was given permission to look at all three of these exams for a single class of students.

Pre-processing:

Create another variable in the dataset to measure the speed of writing: This refers to the number of words that each student wrote per a minute.

The resulting dataset includes age, gender, sleep hours, pen type used, and speed.

Method:

Run OLS regression of speed on pen type used, gender, age, and sleep hours.
- Your assumption for this estimation method is that each repeated observation is treated as an independent person (i.e. pooling the data).
- This may introduce a bias in our parameter estimates, especially if there is variation in each student’s underlying writing ability and these variations are correlated with whether they use gel pens.
Estimate a fixed effects model. In this model, we control for student’s different latent writing speeds, and only look at how switching between a gel and ballpoint pen affected their writing speeds. This is an example of examining within person changes.
- Estimation of this model only gives us parameter estimates for sleep and gel pens. This is because the gender of students did not change across any years (again, we are only looking at how changes within individuals affected their writing speeds), and because the change in age was uniform across years for all respondents (that is, there was no variation in how student’s ages changed across individuals - everyone grew one year older each wave).
- The results from a fixed effects model are reliable; however, they require large sample sizes to find statistically significant results.

plm(speed ~ age + gender + sleep + pen, data=Exams, index=c("id", "wave"), model="within")

Estimate a random effect model.
- It allows us to estimate the effect of gender and age on speed of writing.
- This estimation method imposes somewhat strong assumption that the individual specific effects are uncorrelated with your other parameters. In this example, a random effects model assumes that every individual has a latent writing speed that is uncorrelated with anything else. Factors like age and gender simply improve their writing speeds.

plm(speed ~ age + gender + sleep + pen, data=Exams, index=c("id","wave"), model="random")