9 Regression extensions: Polynomial functions and interaction terms

Abstract

This chapter covers some important extensions of multiple regression that capture nonlinear effects: polynomial specifications and interaction terms.

Keywords: Polynomials specifications, nonlinear, quadratic, interaction terms

9.1 Introduction

The multiple regression model we have examined so far has usually assumed that the relationships between the regressor (X) variables and the outcome (Y ) are linear relationships. In a linear specification, the magnitude of an effect of a change in X on the outcome does not depend on the level of X, nor does it depend on the level of any other explanatory variables. If, on the other hand, the effect of a change in X depended on its level, or on the level of another variable, this would be a nonlinear relationship.

We have already explored one simple way to take account of nonlinear effects using log-transformed variables (chapter 6). In the current chapter, we show how to estimate two additional common kinds of non-linear relationships that can be estimated in a multiple regression setting: polynomial specifications and interaction terms. More complex non-linear relationships may be estimated with extensions of these basic ideas.

9.2 Polynomial regression specifications

A polynomial function can be written as

\[Y = β_0 + β_1X + β_2X^2 + · · · + β_rX^m\]

The polynomial is said to be of degree m depending on the value of the highest exponent. If \(m = 2\) the function is said to be a quadratic, and if \(m = 3\) the function is called a cubic. In practice, social science researchers rarely consider polynomials of degree greater than 3.

To illustrate how this function is non-linear, consider the quadratic function \(Y = 10 − 5X_1 + .20X^2\). If X were equal to 10, then Y would be equal to -20. If X were equal to 20, then Y would be equal to -10. If X were equal to 30, then Y would be equal to 40. It is easy to see that as X continues to increase, the value of Y increases very rapidly. If the function had a cubic term with a positive coefficient, the increases in the outcome would be even larger, for each unit increase in X. In order to understand quadratic or cubic functions, it is advisable to make up a few functions and plot them on graph paper. Depending on the signs and magnitudes of the coefficients, the graph of a quadratic function may be ∪–shaped or ∩-shaped. A cubic function may have a more complicated shape.

By the way, displaying curves is easy in R: simply run a command like curve(10-5*x+.20*x^2, from=0, to=50).

Although a polynomial of degree 2 or greater is nonlinear, it is built as a linear combination (sum) of terms, which means it can be estimated using a linear regression such as OLS. For example, to run a regression of Y on a quadratic in X, we simply create a new variable \(Xsq = X^2\) and add it as a regressor: \(Y = β_0 + β_1X + β_2X_{sq}\).

9.2.1 Accounting for non-linear return to work experience in the Mincer earnings equation

Let’s return to the Mincer earnings equation. So far, we have examined the relationship between the outcome variable earnings (or log of earnings) and the two explanatory variables (regressors) years of education and years of work experience. Experience affects earnings because workers learn on the job and thereby continue to increase their skills (and earnings potential) even after they have completed their formal education and entered the workforce. In addition, it might be the case that employers reward workers’ experience or “seniority” for other reasons.

Mincer argued that the average worker’s earnings would increase with their work experience, but at a decreasing rate. This could be because of “diminishing returns” to on-the-job skills, or because workers invest less time and effort in learning new skills as they approach retirement age and will not have much time to benefit from the returns to new skills. Regardless of the reason, the earnings-experience “profile” might not be linear, but might have a concave shape, with the slope positive but decreasing, and even turning negative if workers’ productivity declines in old age.

We can model this shape using a quadratic function of experience \((exper)\): \[ earnings = \beta_0 + \beta_1exper + \beta_2exper^2\]

Once we have estimated \(β_1\) and \(β_2\), the question arises as to how we should interpret these coefficients. No simple interpretation is possible. To the standard question of asking how much do \(earnings\) change with a one unit change in \(exper\), we must answer, “It depends.” In particular, the effect of a change in \(exper\) depends on the level of \(exper\). We can see this by taking the derivative:

\[\frac{∂earnings}{∂exper} = β_1 + 2β_2exper\] The effect depends on the level of exper, and the signs and magnitudes of the two coefficients. For example, suppose the estimate yields the following:

\[earnings = 110 + 2.4exper − 0.08exper^2\]

The coefficients suggest that as work experience increases, earnings initially increase, on average, but at a diminishing rate, just as Mincer hypothesized. The slope of the relationship between \(earnings\) and \(exper\) here is given by the derivative \(2.4−2 \cdot .08 exper = 2.4−.16exper\): when \(exper = 0\) (a worker just starting out after leaving school), the slope is 2.4, implying that an additional year of school will increase earnings by about 2.4; when \(exper = 10\) (a worker ten years beyond school), the slope is \(2.4−.16(10) = 0.8\), a much lower return to experience. We can easily determine where the function levels out and turns downwards (where the slope becomes negative) by setting the derivative equal to zero and solving for \(exper\). The derivative equals 0 when \(exper = 2.4/.16 = 15\) years of experience since completion of education. Beyond 15 years of experience, in this case, earnings would actually be decreasing in experience—evidence, perhaps, of a “tired old worker” effect.

This regression differs from a standard Mincer specification in two ways. First, we have left out the impact of education itself! So we should put the years of education variable back in as a regressor. Second, as we have seen, the standard Mincer equation uses a log-linear specification, so the log of earnings is regressed on years of schooling and experience. Putting it all together, we have:

\[log(earnings) = \beta_0 + \beta_1educ + \beta_2exper + \beta_3 exper^2\] Let’s implement this using the Kenya DHS 2022 data. First, we need to measure years of work experience. As we have seen earlier, in data sets such as the Kenya DHS 2022 data, years of work experience are often approximated as years of “potential work experience”—namely, the number of years since the individual left their formal schooling. If individuals enter their formal education at roughly age 6, their age at leaving school is about educ yrs + 6, and thus exper = age - educ yrs - 6.

To estimate the above regression in R, then, we would use the following code:

kenya$exper = kenya$age - kenya$educ_yrs - 6
kenya$exper2 = kenya$exper^2
kenya$lnearn = log(kenya$earnings_usd)
reg1 <- lm(lnearn~educ_yrs + exper + exper2,
data=subset(kenya,earnings_usd<=1000))
modelsummary(reg1,
fmt = fmt_decimal(digits = 3, pdigits = 3),
stars=T,
vcov = 'robust',
gof_omit = "IC|Adj|F|Log|std")

	(1)
(Intercept)	3.203***
	(0.041)
educ_yrs	0.123***
	(0.003)
exper	0.039***
	(0.003)
exper2	−0.001***
	(0.000)
Num.Obs.	20829
R2	0.112
RMSE	1.20
Std.Errors	HC3
+ p < 0.1, * p < 0.05, p < 0.01, * p < 0.001

Running this regression yields the results in Table 1. Let’s interpret the results. First, the coefficient on \(educ\_yrs\) is 0.123. Since the regression is log-linear, we can say that an additional year of schooling is associated with 0.12 = 12% greater monthly earnings, controlling for the level of experience. The coefficients on experience and experience squared need to be considered together. The slope with respect to experience here is \(0.039−2(0.001)exper = 0.029 − 0.002exper\). For a person just starting out \((exper = 0)\), the slope is 0.039, implying that an additional year of work experience is associated with 3.9% greater monthly earnings. Earnings increase with experience at a decreasing rate until the maximum is reached at .039/.002 = 19.5 years of experience. Beyond 20 years of experience, additional experience would be associated with declining earnings.

9.2.2 Taking care with polynomials

Using calculus, the derivative of a polynomial function is the effect of a change in the explanatory variable on the outcome. But this only applies for (very) small changes. For larger changes in the explanatory variable, multiplying the derivative times the change in X give an effect that is quite different from the effect calculated as the difference between the two predicted values.

Consider the following function: \(\text{Grade} = 80 + 10 ∗ \text{Coffee} − 2 ∗ \text{Coffee}^2\)

The derivative, the effect on the test grade of a small change in cups of coffee that a person might drink before an exam, is 10−4∗Coffee. If a person wants to predict the effect of drinking another cup of coffee, above their usual 1 cup of coffee, the derivative predicts the effect to be an increase in score of 6. The derivative likewise predicts an effect of drinking two more cups of coffee to be 2∗(10−4) = 12. But if we calculate the predicted grade going from 1 to 3 cups of coffee, we see that the difference is (80+10∗3−2∗32)−(80+10∗1− 2∗12) == 4. The difference is even larger for large changes; if coffee increased by 4 (to 5 cups), the derivative predicts an increase in grade of 24. But the actual difference would be (80+10 ∗ 5−2 ∗ 52)−(80+10 ∗ 1−2 ∗ 12) = −8. Drinking more amounts of coffee causes the score to decline rather than to increase.

The calculations illustrate that one much be careful with interpreting coefficients in non-linear specifications. The usual intuitions about how to interpret the coefficients apply for small changes in the explanatory variables, and not for large changes.

How many terms should the polynomial specification have? Just the quadratic? Always the cubic? The choice over how many terms and variables to include in a specification is part of the art of econometrics, and we shall discuss that in a subsequent chapter. For now, we state a useful rule of good practice: however many terms we include, we must always include the lower order (degree) terms, in order to properly interpret the coefficients. That is, if we have an explanatory variable raised to the power 3, the squared term and the term alone should also be included. In terms of the equation, always have this:

\[ Y = β_0 + β_1X + β_2X^2 + β_3X^3\]

and never this:

\[Y = β_0 + β_3X^3\]

One might also consider whether the highest-order coefficient is adding to the explanatory power of the regression. One way to think about this is to test the null hypothesis that the highest-order coefficient is zero. For example, suppose we estimate the above cubic relationship and find that we cannot reject that the coefficient \(β_3\) on the cubed term is zero; in the interest of “parsimony” we might consider dropping that cubic term.

9.3 Interaction terms

It is often relevant to determine whether the effect on the outcome Y from a change in one explanatory variable depends on the value of another explanatory variable. Is the effect of education on earnings greater for women or for men? Might the effect of education on earnings be greater for Muslims than for Christians, in a multi-religious society like Kenya? To address these questions in the context of a multiple regression model, we include interaction terms. An interaction term is a regressor that is the product of two regressors that are also included in the regression.

We shall consider three cases (and of course there may be mixes of these cases in the same regression model). Let us denote D1 and D2 as dummy variables, and X3 and X4 as continuous variable. We assume that the outcome Y is a continuous variable.

9.3.1 Interaction of two binary regressors

The first case is an interaction term that is the product of two binary variables:

\[Y = β_0 + β_1D_1 + β_2D_2 + β_3D_1D_2\]

As an illustration, consider the question of whether there is an earnings differential associated with marital status that differs by gender. Maybe married men earn more than unmarried men, whereas the reverse is true for women—or vice versa. To explore this, we could use data on individuals, where the outcome Y is earnings, \(D_1\) is a dummy variable indicating whether the person is female, \(D_2\) is a dummy variable indicating whether the person is married, and the product \(D_1D_2\) is the interaction of female and married. Recall that the coefficient on a binary regressor represents the difference in sample means of the outcome between the 0 and 1 categories. The same is true here, but with the added twist of the interaction term.

The following table shows how the dummy variables and interactions imply different mean earnings, for each of the four combinations of gender and marital status.

Table 9.1: Table 9.2: Table 2: Mean earnings implied by dummy interactions
	Unmarried (\(D_2 = 0\))	Married (\(D_2 = 1\))
Male (\(D_1 = 0\))	\(\bar{Y} = β_0\)	\(\bar{Y} = β_0 + β_2\)
Female (\(D_1 = 1\))	\(\bar{Y} = β_0 + β_1\)	\(\bar{Y} = β_0 + β_1 + β_2 + β_3\)

The excluded (reference) category is the category with both dummy variables equal to 0, representing unmarried males: The estimated intercept \(β_0\) (upper left) is the sample mean for this group. The coefficient \(β_1\) indicates the difference in mean earnings between unmarried males and unmarried females, and the coefficient \(β_2\) indicates the difference in mean earnings between unmarried males and married males. When both \(D_1 = 1\) and \(D_2 = 1\), the interaction term \(D_1D_2 = 1 \cdot 1 = 1\), which indicates a married person who is female. Only for this group does the interaction effect \((β_3)\) kick in (lower right). \(β_3\) is the regression coefficient that tells us whether the marriage premium is different for women vs. men.

9.3.2 Interaction of binary and continuous regressors

The second case is an interaction term that is the product of a binary and a continuous variable.

\[Y = β_0 + β_1D_1 + β_2X_3 + β_3D_1X_3\]

Consider an example. Suppose we have estimated the following regression coefficients, using data from a sample of students at various universities. We omit the standard errors:

\[ GPA = 2.7 + .2 ∗ Hours + .2 ∗ Female + .1 ∗ Hours ∗ Female\] where \(Hours\) is a measure of hours of study per week on average. Student grade point average (GPA) depends on hours according to a linear specification. The coefficients suggest that as hours spent studying increase, GPA increases, on average, but at a rate that is higher for women students. The difference in the rate at which \(Hours\) increases GPA for women, compared with for men, is 0.1 (the coefficient on the interaction term). That is, an extra hour increases GPA by .2 for men, but by .3 for women. The slope of the relationship between hours an GPA is steeper for women. Moreover, the intercept is higher for women. If hours were equal to 0, the GPA for women would be 2.9, while that for men would be 2.7.

We can see this by using the expectations operator E on our estimated regression equation, conditional on different values for the variable \(Female\):

\[E(GPA|Female = 0) = 2.7 + .2 ∗ Hours\] \[E(GPA|Female = 1) = 2.9 + .3 ∗ Hours\]

We could also have a quadratic relationship with interaction terms. Consider the following regression equation.

\[GPA = 1.7 + .1 ∗ Hours − .004 ∗ Hours^2 + .2 ∗ Female+ .1 ∗ Hours ∗ Female + .001 ∗ Hours^2 ∗ Female\]

The coefficients suggest that as hours spent studying increase, GPA increases, on average, but at a diminishing rate, and that the curves are different for men and for women. Again, we can use the expectations operator to see this.

\[E(GPA|Female = 0) = 1.7 + .1 ∗ Hours − .004 ∗ Hours^2 E(GPA|Female = 1) = 1.9 + .2 ∗ Hours − .003 ∗ Hours^2\]

You can see these curves by running the following two lines in R:

The curve for men is below that for women, and turns downwards before the curve for women turns downwards. We can interpret this as meaning that the “return” to studying is lower for men than for women, in terms of effects on GPA.

9.3.3 Interaction of two continuous regressors

The third case is an interaction term that is the product of two continuous variables:

\[Y = β_0 + β_1X_3 + β_2X_4 + β_3X_3X_4\]

In this case the effect of one regressor on the outcome varies continuously with the value of the other regressor. Specifically,

\[\frac{∂Y}{∂X_3} = β_1 + β_3X_4\] and \[\frac{∂Y}{∂X_4} = β_2 + β_3X_3\]

9.3.4 Interaction terms in R

Adding interaction terms to a regression is easy in R. Say we are estimating a log earnings equation with regressors years of education and a female dummy variable, and we want to add an interaction. Obviously, we could simply create a new variable for the product of female and education and add it to the regression. But R has a nice way to add interaction terms within the regression (lm) command using the “colon” symbol (:) between the variables being interacted. For instance:

lm(log_earn~ educ_yrs + female + female:educ_yrs, data=kenya)

## 
## Call:
## lm(formula = log_earn ~ educ_yrs + female + female:educ_yrs, 
##     data = kenya)
## 
## Coefficients:
##     (Intercept)         educ_yrs           female  educ_yrs:female  
##          4.3340           0.1007          -1.1085           0.0493

An alternative “shortcut” approach in R is the following use of the asterisk (*), which does the same thing as the preceding command:

lm(log_earn~ educ_yrs * female, data=kenya)

## 
## Call:
## lm(formula = log_earn ~ educ_yrs * female, data = kenya)
## 
## Coefficients:
##     (Intercept)         educ_yrs           female  educ_yrs:female  
##          4.3340           0.1007          -1.1085           0.0493

Although the shortcut saves a little typing, we recommend the first syntax, including both variables and entering the interaction using the colon. This makes it clear that the regression includes all three regressors.

9.3.5 Rule of good practice

How many interaction terms should the specification have? Should every explanatory variable that is continuous be interacted with every explanatory variable that is a dummy variable? As with the question of how many polynomial terms to include, the choice over how many terms to include is part of the art of econometrics, and we shall discuss that in a subsequent chapter.For now, we state another useful rule of good practice: however many interaction terms we include, we must always include the lower order terms, in order to properly interpret the coefficients. That is, if we have an interaction term \(X_1 \cdot X_2\) we must also always have the terms \(X_1\) and \(X_2\) entered singly in the specification. In terms of the equation, always have this:

\[Y = β_0 + β_1X_1 + β_2X_2 + β_3X_1X_2\]

and never this:

\[Y = β_0 + β_1X_1X_2\].

9.3.6 Interaction terms for dummies

When dealing with binary (dummy) regressors, it’s important to bear in mind that the choice of reference category is ultimately arbitrary, and that the regression results always have exactly the same implications whichever way the dummy is specified. For example, in a data set that records marital status as a binary—married \((marr = 1)\) vs. unmarried \((marr = 0)\)—a regression of wage on a dummy variable for \(marr\) has exactly the same implications as a regression of wage on a dummy for \(not\_marr = 1 − marr\). As you know by now, the estimated regression coefficients will be different, but once you sort it out, the predicted (mean) wage conditional on marital status will be the same in either regression, as will the estimated marriage gap in mean wage.

The same logic applies when we interact two dummy variables. Suppose one had the following regression result calculated using data from a sample of adults (where we give the estimated coefficients, and not the standard errors of the coefficients):

\[wage = 10 + 3 marr − 3 kids + 2 marr \cdot kids\]

The dummy variable \(marr\) takes on value 1 if the adult is married, and zero otherwise. Likewise, the dummy variable \(kids\) takes on value 1 if the adult has children, and zero otherwise. The regression adds an interaction term of the two dummy variables. We can predict the mean wages conditional on each of the four possible combinations of \(marr\) and \(kids\) by plugging in the 0’s and 1’s:

\[E(wage|marr = 0, kids = 0) = 10 + 3 ∗ 0 − 3 ∗ 0 + 2 ∗ 0 ∗ 0 = 10\] \[E(wage|marr = 1, kids = 0) = 10 + 3 ∗ 1 − 3 ∗ 0 + 2 ∗ 0 ∗ 0 = 13\] \[E(wage|marr = 0, kids = 1) = 10 + 3 ∗ 0 − 3 ∗ 1 + 2 ∗ 0 ∗ 0 = 7\] \[E(wage|marr = 1, kids = 1) = 10 + 3 ∗ 1 − 3 ∗ 1 + 2 ∗ 1 ∗ 1 = 12\]

Now, let’s redefine the reference categories. Specifically, suppose that instead of the dummy variables being \(marr\) and \(kids\), they were the reverse, \(not\_marr\) and \(no\_kids\). That is, a 1 in \(not\_marr\) means the adult is not married, and a 0 means they are married. Without actually running a new regression, can we deduce what the coefficients of such a regression would be, given what they are in the case where the two dummy variables are \(marr\) and \(kids\)? That is, can we deduce what the coefficients would be in this regression, with the same data as in the first regression?

\[wage = β_0 + β_1 ∗ not\_marr + β_2 ∗ no\_kids + β_3 ∗ no\_marr ∗ no\_kids\] The answer is yes, because changing the reference categories does not change the ultimate predicted wages. We can proceed as before, and determine the expected value of earnings under different assumptions about the values of the dummy variables.

\[E(wage|not\_marr = 0, no\_kids = 0) = β_0 + β_1 ∗ 0 + β_2 ∗ 0 + β_3 ∗ 0 ∗ 0 = β_0\] \[E(wage|not\_marr = 1, no\_kids = 0) = β_0 + β_1 ∗ 1 + β_2 ∗ 0 + β_3 ∗ 0 ∗ 0 = β_0 + β_1\] \[E(wage|not\_marr = 0, no\_kids = 1) = β_0 + β_1 ∗ 0 + β_2 ∗ 1 + β_3 ∗ 0 ∗ 0 = β_0 + β_2\] \[E(wage|not\_marr = 1, no\_kids = 1) = β_0 + β_1 ∗ 1 + β_2 ∗ 1 + β_3 ∗ 1 ∗ 1 = β_0 + β_1 + β_2 + β_3\]

But we know from the first regression that:

\[E(wage|not\_mar = 0, no\_kids = 0) = E(wage|mar = 1, kids = 1) = 12,\] so \(\beta_0=12\). Similarly, we have: \[E(wage|not\_mar = 1, no\_kids = 0) = E(wage|mar = 0, kids = 1) = 7\], so that \(\beta_1 = -5\). We can now calculate \(\beta_2\). Since, \[E(wage|not\_mar = 0, no\_kids = 1) = E(wage|mar = 1, kids = 0) = 13,\] it follows that \(β_2 = 1\). Finally, we can calculate \(β_3\). We have \[E(wage|not\_mar = 1, no\_kids = 1) = E(wage|mar = 0, kids = 0) = 10\]. So \(β_0 +β_1 +β_2 +β_3 = 10\). That means that \12−5+1+β3 = 10, so β3 = 2. Our estimated equation with the reversed dummy variables would be: \[wage = 12 − 5 ∗ not\_mar + 1 ∗ no\_kids + 2 ∗ not\_mar ∗ no\_kids\] ## Estimating nonlinear models

In the cases of nonlinear functions that we have considered—using logs, polynomials, and interactions—although some of the relationships are assumed to be non-linear, the estimation of parameters is quite simple, because we can still use ordinary least squares (OLS). To assure unbiased estimates of the variable(s) of interest, we continue to assume that the explanatory variables X are independent of, or uncorrelated with, any other factors influencing the outcome, or with systematic measurement error of the outcome. This assumption continues to be expressed as \(E(ϵ_i|X) = 0\); the expected error term is assumed to be equal to zero, conditional on the level of the explanatory variable or variables, including any nonlinear terms such as \(X^2\) or \(X_1X_2\).

There are however some practical questions that arise. As we noted earlier, and repeat here, if we are estimating a polynomial function, how high should the highest degree of the polynomial be? Note that we can conduct hypothesis tests on the higher order coefficients. This suggests a strategy of first estimating the quadratic. If the coefficient on the squared term is significant, then estimate a cubic. If the coefficient on the cubed term is significant, then ask whether there is some complex theoretical reason for why the relationship should require higher order exponents. If there is no compelling reason, then certainly three terms is enough. No serious social scientist will claim that the right specification includes higher order terms. One can also look at the adjusted \(R^2\) as one adds higher order terms, to see if they increase the overall explanatory power of the regression.

Another practical question is whether log specifications are preferred to polynomial specifications. Here, again, the theory that you have in mind may be a useful guide. If the relationship you are estimating is one where the language used to describe the relationship is of a growth rate or a proportional change, then the log specification is preferred. Likewise, if the theory is about elasticities, then the log-log specification is preferred. If, on the other hand, the theory is about diminishing marginal product, or some similar relationship, then perhaps a quadratic relationship will be easier to interpret.

9.4 Deploying nonlinear specifications: An example using the Mincer equation

To close out this chapter, we examine several different nonlinear specifications of the Mincer earnings equation, using the Kenya DHS 2022 data. Here is the code that generates the regressions and regression table:

# Read Kenya DHS 2022 data from a website
url <- "https://github.com/mkevane/econ42/raw/main/kenya_earnings.csv"
kenya <- read.csv(url)
# Create a new variable
kenya$exper <- kenya$age - 6 - kenya$educ_yrs
kenya$expersq100 <- (kenya$exper)^2/100
kenya$log_earn = log(kenya$earnings_usd)
# Run several regressions
reg1 <- lm(log_earn~educ_yrs+exper+expersq100, data=subset(kenya,earnings_usd<=1000))
reg2 <- lm(log_earn~educ_yrs+exper+expersq100+female+muslim+kikuyu,
data=subset(kenya,earnings_usd<=1000))
reg3 <- lm(log_earn~educ_yrs+exper+expersq100+
educ_yrs:female+educ_yrs:muslim+educ_yrs:kikuyu, data=subset(kenya,earnings_usd<=1000))
# Put regression results in a list
models=list(reg1, reg2, reg3)
# Make the table with all regressions as separate columns
modelsummary(models,
title="Table: Alternative nonlinear specifications of the Mincer
equation using Kenya DHS 2022 data. Dependent variable
is log of earnings.",
stars=TRUE,
gof_map = c("nobs", "r.squared", "adj.r.squared"),
fmt = 2)

(#tab:9/3 code)Table: Alternative nonlinear specifications of the Mincer equation using Kenya DHS 2022 data. Dependent variable is log of earnings.
	(1)	(2)	(3)
(Intercept)	3.20***	3.32***	3.15***
	(0.04)	(0.04)	(0.04)
educ_yrs	0.12***	0.13***	0.14***
	(0.00)	(0.00)	(0.00)
exper	0.04***	0.05***	0.05***
	(0.00)	(0.00)	(0.00)
expersq100	−0.05***	−0.09***	−0.07***
	(0.01)	(0.01)	(0.01)
female		−0.58***
		(0.02)
muslim		0.59***
		(0.03)
kikuyu		0.18***
		(0.02)
educ_yrs × female			−0.04***
			(0.00)
educ_yrs × muslim			0.04***
			(0.00)
educ_yrs × kikuyu			0.02***
			(0.00)
Num.Obs.	20829	20829	20829
R2	0.112	0.184	0.152
R2 Adj.	0.112	0.184	0.152
+ p < 0.1, * p < 0.05, p < 0.01, * p < 0.001

In Table 3, the dependent variable in each regression is log of earnings. Regression (1) is a “garden variety” Mincer equation, with regressors years of education and a quadratic in years of experience. Note that we have re-scaled the square of experience by dividing the experience squared variable by 100, which makes the coefficients larger and easier to fit on the table! The results should be familiar by now: an additional year of education is associated with 12% greater earnings, while earnings increases with experience initially, but at a decreasing rate.

In regression (2), we add dummy variables for female, Muslim, and Kikuyu ethnicity. Controlling for education, experience, and the two other dummies, women earn substantially less than men, while Muslims and— to a smaller degree— people who identify as Kikuyu earn more than people who do not belong to those groups.

Finally, regression (3) adds interactions between the dummy variables and education. These interactions allow us to see whether the return to education is different for members of these groups. The coefficient on the interaction \(educ\_yrs × female\) is -0.04 and statistically significant. It implies that the return to a year of education for females is lower by 4% than it is for males; that is, the return for males (the reference category) is 12%, whereas for females it is 12 − 4 = 8%. Using the same logic, see if one can interpret the estimated interaction coefficients \(educ\_yrs×muslim\) and \(educ\_yrs×kikuyu\).

Review terms and concepts: • non-linear specification • polynomial • quadratic and cubic • interaction term

8 Understanding omitted variable bias

10 Recapitulation of and practical tips for regression analysis