
Estimating Games
Introduction
In Chapter 7 we used game theory to revisit demand estimation. This chapter takes game theory to a broader set of problems in microeconometrics. Chapter 7 considered games with pure strategies, and this chapter introduces the idea of mixed strategies. This chapter also introduces a standard estimator for game theory based models, the generalized method of moments (GMM). The chapter uses the GMM estimator to revisit returns to schooling and whether or not NFL coaches punt too often on 4th down.
Mixed Strategy Nash Equilibrium
In his 2006 paper, Berkeley macroeconomist David Romer argues that coaches in the NFL do not behave rationally. In particular, they choose to punt too often on fourth down. The goal in American football is to score points. The primary method for scoring is to run the ball across your opponent’s goal-line. The ball can be moved down toward the goal-line by either running and carrying or by throwing the ball to a team-mate to catch. In American football the offense has 4 tries (“downs”) to move the ball ten yards.
After the third try, the coach has a decision to make. They can “go for it.” That is, try a standard play which involves running with the ball or throwing the ball. If they make the distance, the number of tries resets, which is good. If they don’t make the distance, the other team gets the ball, which is bad. Most importantly, the other team gets the ball right at the same spot on the field, which is very bad. This means the other team may have a particularly easy chance to score. As I said, very bad. Punting the ball also gives the ball to the other team. But, and this is the kicker, pun intended, the other team gets the ball where the punt lands. This means that the other team may have a particularly hard chance to score, which is not that bad.
This idea that coaches punt too often has become pervasive; there is even an episode of the CBS TV show Young Sheldon discussing the issue. But is it correct? We do not observe the information required to evaluate the coaches’ decision. The fact that coaches almost always punt means that we don’t actually observe what happens when they don’t punt. We don’t observe the counter-factual outcome. However, we can use game theory to predict it.
Coaches’ Decision Problem
The decision to punt or not on fourth down can be thought of as maximizing expected points rather than maximizing the probability of winning. Romer (2006) argues that, at least early on in the game, this is a reasonable way to model the coaches’ problem.
Here the expected points are calculated by Ron Yurko. For more information go here: https://github.com/ryurko/nflscrapR-data. The data is scraped from NFL play-by-play information (https://www.nfl.com/scores).1

The Figure 1 presents the average expected points for the first quarter. Expected points are calculated using dynamic programming. Each yard line and possession combination is a “state.” We can then calculate the probability of moving from any state to any other state. For some states one of the teams scores. Thus we can determine the expected point differential for each yard line. Note that when a team is within their own 20, the expected points is actually negative. Even though one team may have the ball, it is actually more likely for the other team to get it and score.
Consider a fourth down with 5 yards to go. The ball is currently on the offense’s own 45 yard line (55 yards to the goal line). There are two important numbers, the “to-go” distance of 5 yards and the position on the field which is 55 yards to the goal line. The longer the to-go distance the hard it is to get the distance. The further from the goal line, the easier it is for the other team to score if they get the ball.
Using Figure 1 we can calculate the expected points for the defense from each position. For illustrative purposes assume that we expect the defense to earn 0 points if the offense punts the ball. That is, the team punts the ball from their own 45 to the other team’s 20. The punting earns the team -0 in expected points. Alternatively, the offense could try for the 5 yards. If they get it, they are at the 50 and they will earn 2 points in expectation. But what if they don’t get it? If they turn the ball over on downs, the defense starts at the offense’s 45. They have 45 yards until the goal line. This gives the offense about -2.3 in expected points. So whether to punt depends on the probability of getting the 5 yards.
The offense should go for it if the left-hand side is bigger than zero.
\[ \begin{array}{c} 2 p - 2.3 (1 - p) > 0\\ \mbox{or}\\ p > 0.54 \end{array} \tag{1}\]
If \(p\) denotes the probability of getting the 5 yards, then the coach should go for it if \(p > 0.54\). So what is \(p\)?
This seems straightforward. We just calculate the probability of getting 5 yards from the data. Just take all the fourth down attempts to go 5 yards and calculate the probabilities. The problem is that teams don’t actually go for it on fourth down. We can’t calculate the probability of outcomes for events that do not occur.
Romer’s solution is to use the estimated probability of going 5 yards on third down. A concern with Romer’s approach is third down and fourth down are different. Instead, the solution here is to use the third down data to estimate the game played on third down. Then take the estimated parameters and use them to predict what would occur in the game played on fourth down.
Zero-Sum Game
We can model the third-down game as a **zero-sum game}. These games date to the origins of game theory in the 1920s, when the mathematician John von Neumann was interested in modeling parlor games like cards or chess. In these games, there is a winner and a loser. What one person wins, another person loses.
Consider a relatively simple zero-sum game presented in Table 1. Remember a game has three elements; players, strategies and payoffs. The players are called Offense and Defense. Each has the choice between two strategies, Run and Pass. Payoffs are expected points (for simplicity). In the game, a pass play will get more yards and thus more expected points, but is generally higher risk. For the Defense, to play “Pass,” means to implement a pass oriented defense such as one in which the line-backers drop back into zones. This is a zero-sum game, so the payoff to one player is the negative of the payoff to the other player.
{Off,Def} | Run & Pass |
---|---|
Run | {0,0} |
Pass | {2,-2} |
The game represented in Table 1 has “low” payoffs on the diagonal and “high” payoffs on the off-diagonal. That is, the Offense prefers to choose Run when the Defense chooses Pass. Similarly, they prefer Pass when the Defense chooses Run. Note also, that the payoff is higher from a successful pass than a successful run. This captures the fact that a pass tends to gain more yardage.
What is the solution to this game?
What is the Nash equilibrium? One possibility is that both teams play \(\{\mathrm{Run},\mathrm{Run}\}\). But if the Defense plays Run, it is optimal for Offense to play Pass. So that is not a Nash equilibrium. Go through the different cases. Can you find the Nash equilibrium? Is there a Nash equilibrium?
Nash Equilibrium
The Nobel prize winning mathematician, John Nash, proved that for relatively general games, there always exists at least one Nash equilibrium. However, there may not exist a pure strategy Nash equilibrium as was discussed above. The only Nash equilibrium may be in mixed strategies.
A mixed strategy is one in which the player places probability weights on the available actions. Instead of choosing Run, the player places some probability of Run and some probability of Pass. It may not be that the player literally uses a coin or some other randomization device. Rather, the assumption is that the other players in the game do not know exactly what the player will do. The other players know only the probability weights that are placed on Pass and Run.
In our problem neither team should telegraph its play choice to the other team. They should keep the other team guessing. Think about the child’s game, Rock/Paper/Scissors. You want to keep switching between the three options. It is not optimal to always choose Rock. If the other player knows you will always choose Rock, they will always choose Paper. In football, both teams try to have fake plays and formations in order to keep the other team guessing.
Determining the mixed strategy Nash equilibrium is tricky. You must show that the player is indifferent between her actions. In a mixed strategy Nash equilibrium, each player is indifferent between Run and Pass. If this is not true, say a player prefers Run, then mixing cannot be optimal. It is optimal to choose Run.
The Nash equilibrium in mixed strategies is where the Offense chooses a probability of Pass (\(q_o\)) such that the Defense is indifferent between Run and Pass. Similarly for \(q_d\).
\[ \begin{array}{l} \mbox{Def: } -2q_o = -(1 - q_o)\\ \mbox{Off: } q_d = 2(1 - q_d) \end{array} \tag{2}\]
The equilibrium is the solution to the two equations in Equation 2. The mixed strategy Nash equilibrium is \(\{q_o = \frac{1}{3}, q_d = \frac{2}{3}\}\). In equilibrium, the Offense will tend to play Run, while Defense keys on Pass. Why is this? Does it make sense for the Offense to play something with a lower payoff?
Third and Fourth Down Game
{Off,Def} | Run | Pass |
---|---|---|
Run | \(-(1 - p_{rr})\mathbb{E}(P | Y_{k}) + p_{rr}\mathbb{E}(P | Y_r)\) | \(-(1 - p_{rp})\mathbb{E}(P | Y_{k}) + p_{rp} \mathbb{E}(P | Y_r)\) |
Pass | \(-(1 - p_{rr})\mathbb{E}(P | Y_{k}) + p_{pr}\mathbb{E}(P | Y_{p})\) | \(-(1 - p_{pr})\mathbb{E}(P | Y_{k}) + p_{pp}\mathbb{E}(P | Y_{p})\) |
Consider the Third Down game represented by Table 2. This is a zero-sum game, so the payoff presented is the payoff to the Offense. It is assumed that if the Offense does not make first down, then they punt. The expected points is dependent on both the current location, \(Y\), and the expected distance of the punt (\(\mathbb{E}(P | Y_k)\). This position is denoted \(Y_k\). If the Offense is successful in getting first down, then the expected points depends on whether the play was a pass or a run, \(Y_p\) and \(Y_r\), respectively.
The Third Down Game and the Fourth Down game are similar. In particular, we assume that the probabilities of success conditional on the strategies of the Offense and Defense remain the same. What changes between downs is the expected points associated with the failure. In the Third Down if the Offense fails, it gets \(-\mathbb{E}(P | Y_k)\). The negative is because the ball goes to the other team. But the ball moves to the location where finishes after the punt. In the Fourth Down game it gets \(-\mathbb{E}(P | 100 - Y)\). That is, the ball is given to the opponent at the exact same location. Note that they have to go the other direction.
The objective is to estimate the parameters of the game that will be used to model the policy of going for it on fourth down. Those parameters are four conditional probabilities. These are the probability of successfully getting first down conditional on the type of play, Run or Pass, chosen by both the Offense and the Defense. This set of parameters is denoted \(\theta = \{p_{rr},p_{rp},p_{pr},p_{pp}\}\), where \(p_{rp}\) is the probability of success when the Offense chooses Run and the Defense chooses Pass. The conditional probability, \(p_{rp} = \Pr(\mathrm{Success} | O=\mathrm{Run}, D=\mathrm{Pass})\).
Equilibrium Strategies
If we could observe the actions of both the Offense and the Defense, then it would be straightforward to estimate the conditional probabilities of interest. While we have very good data on the action chosen by the Offense, we don’t know what action the Defense chose. The available play by play data doesn’t provide a lot of information on the type of defense that was used in each play.
To identify the conditional probabilities, we combine the observed probabilities with the constraints from the Nash equilibrium.
\[ \begin{array}{l} \Pr(\mathrm{Pass}) = q_o\\ \Pr(\mathrm{Success} | \mathrm{Pass}) = p_{pr} (1 - q_d) + p_{pp} q_d\\ \Pr(\mathrm{Success} | \mathrm{Run}) = p_{rr} (1 - q_d) + p_{rp} q_d \end{array} \tag{3}\]
In equilibrium the following equalities must hold. The probability that the Defense plays Pass (\(q_d\)) must set the expected value of Offense playing Run to the expected value of the Offense playing Pass. See the first equality. Similarly, the probability that the Offense plays Pass (\(q_o\)) must set the expected value of Defense playing Run to the expected value of the Defense playing Pass.
\[ \begin{array}{l} (1 - q_d) V_{3rr} + q_d V_{3rp} = (1 - q_d) V_{3pr} + q_d V_{3pp}\\ (1 - q_o) V_{3rr} + q_o V_{3pr} = (1 - q_o) V_{3rp} + q_o V_{3pp} \end{array} \tag{4}\]
where the value functions are as follows.
\[ \begin{array}{l} V_{3rr} = - (1 - p_{rr})\mathbb{E}(P | Y_k) + p_{rr}\mathbb{E}(P | Y_r) \\ V_{3rp} = - (1 - p_{rp})\mathbb{E}(P | Y_k) + p_{rp}\mathbb{E}(P | Y_r) \\ V_{3pr} = - (1 - p_{rp})\mathbb{E}(P | Y_k) + p_{pr}\mathbb{E}(P | Y_p) \\ V_{3pp} = - (1 - p_{pp})\mathbb{E}(P | Y_k) + p_{pp}\mathbb{E}(P | Y_p) \end{array} \tag{5}\]
Assuming these equalities hold in the data allows us to estimate the parameters.
Rearranging we have the equilibrium strategies for the Offense and Defense.
\[ \begin{array}{l} q_o = \frac{V_{3rp} - V_{3rr}}{(V_{3rp} - V_{3rr}) + (V_{3pr} - V_{3pp})}\\ \\ q_d = \frac{V_{3pr} - V_{3rr}}{(V_{3pr} - V_{3rr}) + (V_{3rp} - V_{3pp})} \end{array} \tag{6}\]
In general, the expected points are larger when there is a mis-match in the actions of the two teams. This suggests that the strategies will always lie between 0 and 1. Can you provide an intuitive explanation of these probabilities? It is not at all obvious. For example if the value to the Offense of playing Pass versus Run increases then the probability the Offense plays Pass decreases.
Equilibrium Strategies in R
The estimator involves solving for the equilibrium strategies of the Offense and Defense. The function takes in the parameters (the conditional probabilities) and the expected points for each situation. It then calculates the equilibrium probability of playing Pass for the Offense and the Defense based on Equation 6. Note that the expected points associated with failure depend on which down. For third down it is assumed that it is determined by the expected points from a punt, while for fourth down it is the expected points from turning the ball over to the other team at that position.
# a function for determining the equalibrium strategies.
<- function(theta,Ep) {
q_fun <- Ep[,1]
Ep_run <- Ep[,2]
Ep_pass <- Ep[,3]
Ep_fail <- - (1 - theta[1])*Ep_fail + theta[1]*Ep_run
V_rr <- - (1 - theta[2])*Ep_fail + theta[2]*Ep_run
V_rp <- - (1 - theta[3])*Ep_fail + theta[3]*Ep_pass
V_pr <- - (1 - theta[4])*Ep_fail + theta[4]*Ep_pass
V_pp <- (V_rp - V_rr)/((V_rp - V_rr) + (V_pr - V_pp))
qo <- (V_pr - V_rr)/((V_pr - V_rr) + (V_rp - V_pp))
qd for (i in 1:length(qo)) { qo[i] <- min(max(qo[i],0),1) }
for (i in 1:length(qd)) { qd[i] <- min(max(qd[i],0),1) }
# forcing results to be probabilities.
return(list(qo=qo,qd=qd))
}
Simulation of Third and Fourth Down Game
To illustrate the estimator, consider a simulation of the Third Down game presented above. We observe a similar game played 2,000 times.
set.seed(123456789)
<- 2000 # number of plays
N <- c(0.2, 0.8, 0.5, 0.1)
theta # parameters of the model
# conditional probabilities
<- 50 + 20*runif(N)
Y # current yards to go.
<- Y - rnorm(N,mean=35)
Yk <- ifelse(Yk > 0, Yk, 20)
Yk # yards to go after punt.
<- Y - rnorm(N,mean=15)
Yp <- ifelse(Yp > 0, Yp,0)
Yp # yards to go after pass
<- Y - rnorm(N,mean=3)
Yr <- ifelse(Yr > 0, Yr,0)
Yr # yards to go after run
<- function(x) 5 - 5*x/100
EP # an expected points function to approximate figure above.
# equalibrium strategies
= q_fun(theta, cbind(EP(Yr), EP(Yp), EP(100 - Yk)))
q3 = q_fun(theta, cbind(EP(Yr), EP(Yp), EP(100 - Y))) q4
3rd O Pass | 3rd D Pass | 4th O Pass | 4th D Pass |
---|---|---|---|
Min. :0.5444 | Min. :0.3399 | Min. :0.5629 | Min. :0.3269 |
1st Qu.:0.5572 | 1st Qu.:0.3577 | 1st Qu.:0.5711 | 1st Qu.:0.3390 |
Median :0.5604 | Median :0.3627 | Median :0.5733 | Median :0.3423 |
Mean :0.5604 | Mean :0.3627 | Mean :0.5733 | Mean :0.3423 |
3rd Qu.:0.5636 | 3rd Qu.:0.3678 | 3rd Qu.:0.5754 | 3rd Qu.:0.3457 |
Max. :0.5748 | Max. :0.3881 | Max. :0.5830 | Max. :0.3588 |
Note the difference between third down and fourth down. Table 3 shows that the change in payoffs leads to a subtle change in the strategies for the simulated teams. For the Offense there is a 1 percentage point increase in the probability of Pass, while for the Defense there is a 2 to 3 percentage point decrease in the probability of Pass. Do you see why that would be? Why does the Defense key on Run when moving to 4th down?
<- runif(N) < q3$qo
Pass <-
success3 runif(N) < Pass*(theta[3]*(1 - q3$qd) + theta[4]*q3$qd) +
!Pass)*(theta[1]*(1 - q3$qd) + theta[2]*q3$qd)
(= ifelse(Pass,"pass","run")
play # this is created to simulate the estimator below.
# fourth down
<- runif(N) < q4$qo
Pass <-
success4 runif(N) < Pass*(theta[3]*(1 - q4$qd) + theta[4]*q4$qd) +
!Pass)*(theta[1]*(1 - q4$qd) + theta[2]*q4$qd) (
The Defense moves to become more focused on stopping the Run. This change in the strategies leads to a change in the success rates. In particular, the probability of the Offense successfully completing a pass or run falls by more than a percentage point. These simulated results suggest that using third down success rates leads to biased estimates and makes “going for it” look more valuable than it actually is.
mean(success3)
[1] 0.3865
mean(success4)
[1] 0.369
Generalized Method of Moments
This section introduces a new estimation algorithm, called generalized method of moments (GMM). This algorithm was developed by Nobel prize winning economist, Lars Peter Hansen. While its initial applications were in macroeconomics, GMM has become standard in microeconometric estimation problems involving game theory. However, in order to understand the algorithm, we will take a detour and return to the question of estimating OLS.
Moments of OLS
The algorithm is a generalization of the least squares algorithm presented in Chapter 1. OLS can be estimated by noting that the mean of the error term is equal to zero. That is, when the first **moment} of the error term distribution is zero.
In the model introduced in Chapter 1, we have some outcome that is a function of observables and unobservables.
\[ y_{i} = \mathbf{X}_i' \beta + \upsilon_i \tag{7}\]
where \(y_i\) is the observed outcome of interest for unit \(i\), \(\mathbf{X}_i\) are the vector of observed **explanatory variables}, \(\beta\) is the vector of parameters of interest and \(\upsilon_i\) represents unobserved characteristics of unit \(i\).
Let the unobserved term be normally distributed, \(\upsilon_i \sim \mathcal{N}(0, \sigma^2)\). The first moment of the unobserved characteristic is equal to 0.
\[ \mathbb{E}(\epsilon_i) = \mathbb{E}(y_i - X_i' \beta) = 0 \tag{8}\]
The second moment is the square of the unobserved term which is equal to variance (\(\sigma^2\)).
\[ \mathbb{E}(\epsilon_i^2) - \sigma^2 = \mathbb{E}((y_i - X_i' \beta)^2) - \sigma^2 = 0 \tag{9}\]
Note that in this case \((\mathbb{E}(\epsilon_i))^2 = 0\).2
In Chapter 1 we saw that we can use least squares to estimate \(\beta\) using the first moment. This suggests that we can also use least squares to estimate the variance \(\sigma^2\).
\[ \min_{\beta, \sigma} \sum_{i=1}^N ((y_i - \mathbf{X}_i' \beta)^2 - \sigma^2)^2 \tag{10}\]
Hansen noted that in various problems there may be more than one **moment} that must equal to zero. Having multiple moment conditions seems great. But it may be too much of a good thing. Multiple moments imply multiple solutions. While in theory there may exist only one set of parameters that satisfy all the moment conditions, in data there may exist many different parameters that satisfy the conditions. Below, we discuss Hansen’s solution to this problem.
Simulated Moments OLS
Consider the data we simulated in Chapter 1. This time there is a slight difference; in this data \(\sigma = 2\) rather than 1 as in Chapter 1. The two functions below are based on the equations presented above. The first estimates OLS using the first moment of the unobserved distribution and the second estimates OLS using the second moment. Using the second moment allows all three parameters to be estimated.
set.seed(123456789)
<- 500
N <- 2
a <- 3
b <- runif(N)
x <- rnorm(N, mean=0, sd=2)
u <- a + b*x + u
y
<- function(beta, y, X) {
f_1mom <- as.matrix(y)
y <- as.matrix(cbind(1,X))
X <- mean((y - X%*%beta)^2)
sos return(sos)
}
<- function(par, y, X) {
f_2mom <- as.matrix(y)
y <- as.matrix(cbind(1,X))
X <- exp(par[1]) # use to keep positive
sigma <- par[-1]
beta <- mean(((y - X%*%beta)^2 - sigma^2))^2
sos return(sos)
}
<- optim(par=c(2,3), fn = f_1mom, y=y, X=x)
a # beta
$par a
[1] 1.993527 3.027305
<- optim(par=c(log(2),2,3), fn = f_2mom, y=y, X=x)
b # sigma
exp(b$par[1])
[1] 2.032743
# beta
$par[-1] b
[1] 1.556888 3.385584
We can estimate OLS using two different moment conditions, but that also gives us two different answers. Although neither answer is particularly accurate. The first moment estimator gives estimates of the intercept and slope that are pretty close to the true values of 2 and 3. The second moment estimator also estimates the variance. The estimate is close to the true value of 2, but it does a poor job estimating the intercept and slope.
Can we improve on these estimates by combining the two moment estimators? The most obvious way to do this is to add them together.
<- function(par, y, X) {
f_gmm_simple <- as.matrix(y)
y <- as.matrix(cbind(1,X))
X <- exp(par[1])
sigma <- par[-1]
beta <- mean((y - X%*%beta)^2) +
sos mean(((y - X%*%beta)^2 - sigma^2)^2)
return(sos)
}<- optim(par=c(log(2),2,3), fn = f_gmm_simple, y=y, X=x)
c # sigma
exp(c$par[1])
[1] 2.014251
# beta
$par[-1] c
[1] 1.902530 3.151387
That is, we could equally weight the two conditions. This gives an estimate of the variance and estimates of the intercept and slope that average over the previous two results. The variance estimate is pretty good but the intercept and slope estimates are not particularly close to the true values.
Why use equal weights? Why not use some other weights? Which weights should we use?
GMM of OLS
Let \(\theta = \{\beta, \sigma\}\) represent the parameters we are trying to estimate. Let each moment condition be denoted \(g_{ki}(\theta, y_{i}, \mathbf{X}_{i})\) and \(g_i(\theta, y_i, \mathbf{X}_i)\) denote the vector of moment conditions. So by definition we have \(\mathbb{E}(g_{ki}(\theta, y_i, \mathbf{X}_i)) = 0\) for all \(k \in \{1, 2\}\) and \(i \in \{1,...N\}\).
The analog estimator is then one that finds smallest values for the vector of moment conditions. The estimator minimizes the following analog:
\[ \hat{\theta} = \arg \min_{\theta} \left(\frac{1}{N}\sum_{i=1}^N g_i(\theta, y_i,\mathbf{X}_i) \right)' \mathbf{W} \left(\frac{1}{N}\sum_{i=1}^N g_i(\theta, y_i,\mathbf{X}_i) \right) \tag{11}\]
where \(\mathbf{W}\) is a \(2 \times 2\) positive semi-definite matrix. This matrix provides the weights. Hansen (1982) shows that an optimal weighting matrix is a function of the true parameter values, which are unknown. The estimate of the weighting function is then the appropriate sample analog.
\[ \hat{\mathbf{W}} = \left(\frac{1}{N}\sum_{i=1}^N g_i(\hat{\theta}, y_i, \mathbf{X}_i) g_i(\hat{\theta}, y_i, \mathbf{X}_i)' \right)^{-1} \tag{12}\]
Below the estimation procedure will determine \(\hat{\theta}\) and \(\hat{\mathbf{W}}\) simultaneously.
Note that the notation here is pretty confusing. In particular, it is hard to keep track of the different summations and what exactly is going on with the vectors. Part of the confusion is that the ordering is different between Equation 11 and Equation 12. In Equation 11 we take the mean first so we have a vector of means; then we multiply those together. In Equation 12 we multiply vectors together at the observation level and take the mean of those.
It is easier to see the difference to be re-writing Equation 12 using matrices.
\[ \hat{\mathbf{W}} = \frac{1}{N} (\mathbf{G} \mathbf{G}')^{-1} \tag{13}\]
where \(\mathbf{G}\) is a \(K \times N\) matrix where each column is the vector \(g_i(\theta, y_i, \mathbf{X}_i)\).
GMM OLS Estimator in R
The GMM estimator is in two parts. There is a general GMM function that takes in the \(\mathbf{G}\) matrix. This is a matrix where each row is particular moment. Note that the function is written with a check on the number of moments. The second part is the particular GMM example. In this case it is the GMM estimator for OLS using the first and second moments of the unobserved characteristic distribution.
<- function(G, K) {
f_gmm <- as.matrix(G)
G <- dim(G)[2]
N if (K==dim(G)[1]) {
# a check that the matrix G has K rows
<- rowMeans(G, na.rm = TRUE)
g <- try(solve(G%*%t(G)/N), silent = TRUE)
W # try() lets the function work even if there is an error
if (is.matrix(W)) {
# if there is no error, W is a matrix.
return(t(g)%*%W%*%g)
}else {
# allow estimation assuming W is identity matrix
return(t(g)%*%g)
}
}else {
return("ERROR: incorrect dimension")
}
}
<- function(par, y, X) {
f_ols_gmm <- as.matrix(y)
y <- as.matrix(cbind(1,X))
X <- exp(par[1])
sigma <- par[2:length(par)]
beta <- y - X%*%beta
g1 <- (y - X%*%beta)^2 - sigma^2
g2 return(f_gmm(t(cbind(g1,g2)),K=2))
}<- optim(par=c(log(2),2,3), fn = f_ols_gmm, y=y, X=x)
d exp(d$par[1])
[1] 2.013754
$par[2:3] d
[1] 1.969362 3.075517
The GMM estimator does a pretty good job. It estimates the variance parameter relatively well and does a better job at estimating the intercept and slope. It is not quite as good as least squares for \(\beta\), but it is the only estimator that is able to estimate both \(\beta\) and \(\sigma\) with reasonable accuracy.
GMM of Returns to Schooling
A standard use of GMM is as an instrumental variable estimator. In particular, GMM can be used when we have multiple instruments for the same variable. We saw this in Chapter 3. We have two potential instruments for the level of education, distance to college and parents at home. In Chapter 3 we used these to conduct an over-identification test. Which they passed! More accurately, which they didn’t fail!
Above we used the first and second moment of the distribution of the unobserved term to create our GMM estimator. Here, we use a moment of the joint distribution between the instrument and the unobserved term. Recall an important assumption of an instrument. It is independent of the unobserved term. In the graph, there is no arrow from the unobserved term to the instrument.
One implication of this assumption is that the unobserved term and the instrument are not correlated.
\[ \mathbb{E}(z_i \upsilon_i) = 0 \tag{14}\]
where \(z_i\) is a proposed instrument. Note that we can re-write this in the following way.
\[ \mathbb{E}(z_i (y_i - \mathbf{X}_i' \beta)) = 0 \tag{15}\]
Further, if we replace \(z_i\) with one of the explanatory variables \(x_i\) then we have an alternative OLS estimator.
To see how it works we can return to returns to schooling and the data we used in the first three chapters from Card (1995). The code is identical to the code in Chapter 3. The difference is that instead of using one instrument for level of education, we can use two. Note that for simplicity I don’t instrument for experience.
Warning: NAs introduced by coercion
<- x1$lwage76
y <- cbind(x1$ed76, x1$exp, x1$exp2, x1$black, x1$reg76r,
X $smsa76r, x1$smsa66r, x1$reg662, x1$reg663,
x1$reg664, x1$reg665,x1$reg666, x1$reg667,
x1$reg668, x1$reg669)
x1$age2 <- x1$age76^2 x1
<- function(beta, y, X, Z) {
f_iv_gmm <- as.matrix(y)
y <- as.matrix(cbind(1,X))
X <- as.matrix(Z) # matrix of instruments of schooling
Z <- Z[,1]*(y - X%*%beta)
g1 <- Z[,2]*(y - X%*%beta)
g2 return(f_gmm(t(cbind(g1,g2)),K=2))
}
As a reminder, we can compare the IV estimator to the OLS estimator presented in Chapters 2 and 3. The estimated parameters from the OLS estimator are used as starting values for the GMM estimator.
<- cbind(1,X)
X1 <- solve(t(X1)%*%X1)%*%t(X1)%*%y
beta 2] beta[
[1] 0.07469326
<- optim(par=beta, fn = f_iv_gmm, y=y, X=X,
a Z=cbind(x1$nearc4,x1$momdad14))
$par[2] a
[1] 0.07350604
The GMM estimator allows both distance to college and parents at home to instrument for education. Interestingly, this estimator gives *lower} values for returns to schooling than OLS. This is in contrast to the IV results presented in Chapter 3 and the Heckman estimates presented in Chapter 6.
Estimating the Third Down Game
After a detour to learn more about GMM, we can now use the estimator for estimating our third down game. Remember, we are using the mixed strategy Nash equilibrium to generate moments that we can use to estimate the parameters.
Moment Conditions
The Equation 3 and Equation 4 suggest a method for estimating the parameters of interest. The mixed strategy Nash equilibrium provides the Offense and Defense strategies conditional on the expected points for each option and the conditional probabilities.
\[ \begin{array}{l} \mathbb{E}(s_i | O=\mathrm{Pass}) = p_{pr}(1 - q_d(\theta, V(\mathbf{X}_i))) + p_{pp}q_d(\theta, V(\mathbf{X}_i)))\\ \mathbb{E}(s_i | O=\mathrm{Run}) = p_{rr}(1 - q_d(\theta, V(\mathbf{X}_i))) + p_{rp}q_d(\theta,V(\mathbf{X}_i)))) \\ \Pr(O = \mathrm{Pass}) = q_o(\theta, V(\mathbf{X}_i)) \end{array} \tag{16}\]
where \(s_i \in \{0, 1\}\) is indicator of success, \(\theta = \{p_{rr},p_{rp},p_{pr},p_{pp}\}\) is the vector representing the parameters of interest and \(\mathbf{X}_i\) represents observable characteristics of the situation. The functions \(q_o(.)\) and \(q_d(.)\) represent the equilibrium strategies of the Offense and Defense given the conditional probabilities and expected points.
The first condition states that conditional on the Offense playing Pass, the predicted success rate must be the same as the observed rate in expectation. The second condition is similar but for when the Offense plays Run. The third condition states that the observed probability of the Offense playing pass on third down must be equal to the predicted probability from the third down game, on average.
The Equation 17 presents the sample analogs of the moment conditions in Equation 16.
\[ \begin{array}{l} 0 = \frac{1}{N} \sum_{i=1}^N (\mathbb{1}(\mathrm{Pass}_i=1)(s_i - \theta_3(1 - q_d(\theta)) - \theta_4 q_d(\theta)))\\ \\ 0 = \frac{1}{N} \sum_{i=1}^N (\mathbb{1}(\mathrm{Pass}_i=0)(s_i - \theta_1(1 - q_d(\theta)) - \theta_2 q_d(\theta))) \\ \\ 0 = q_o(\theta) - \frac{1}{N} \sum_{i=1}^N \mathbb{1}(\mathrm{Pass}_i=1) \end{array} \tag{17}\]
The GMM estimator finds the vector \(\hat{\theta}\) that minimizes the weighted average of these three moments.
Third Down GMM Estimator in R
For all \(|Y - Y'| < \epsilon\)
The Assumption 1 allows the parameters to be estimated with variation in the situation, but holding the parameters constant. It states that for small changes in the yardage, the success probabilities are unchanged conditional on the yardage, the actions of the Offense and Defense, and the observed characteristics, \(\mathbf{X}\).
The GMM estimator has three parts. The first part of the estimator assumes that Equation 4 holds. It uses the Nash equilibrium to determine mixed strategies of the Offense and Defense given the parameters and the expected outcomes from each of the choices. The second part is the analog of the moment condition. The last part determines the estimated weighting matrix conditional on the observed probabilities, the expected points and the estimated parameter values.
# the GMM estimator, which calls the general GMM function above.
<- function(par,Ep,s,play) {
p3_fun <- exp(par)/(1 + exp(par))
theta # using sigmoid function to keep values between 0 and 1
<- q_fun(theta,Ep)
q3 # determine the equalibrium strategies.
# moments
<- play=="pass"
Pass <- Pass*(s - theta[3]*(1 - q3$qd) - theta[4]*q3$qd)
g1 <- (!Pass)*(s - theta[1]*(1 - q3$qd) - theta[2]*q3$qd)
g2 <- Pass - q3$qo
g3 <- t(cbind(g1,g2,g3))
G # note the transpose.
return(f_gmm(G,3))
}
<- cbind(EP(Yr), EP(Yp), EP(100 - Yk))
EP1 <- optim(par=log(2*theta),fn=p3_fun,Ep=EP1,
a1 s=success3,play=play,control = list(maxit=10000))
exp(a1$par)/(1 + exp(a1$par))
[1] 0.2740818 0.6630758 0.4570891 0.2010548
The estimator does an OK job. The true values are \(\theta = \{0.2, 0.8, 0.5, 0.1\}\). None of the parameters are estimated particularly accurately. What happens if you use a larger data set?
Are NFL Coaches Rational?
The assumption of decision maker rationality underpins many of the models in macroeconomics and microeconomics, including estimators presented in this book. Romer (2006)} argues that NFL coaches are not behaving rationally, and that this has implications for the foundations of economics.
The difficulty with testing the rationality of these decisions is that NFL coaches do not actually go for it on fourth down. Therefore, we cannot actually measure what happens. We use game theory to model third and fourth downs. In the model, the success rates depend upon success rates conditional on the strategies of the Offense and Defense. We used the observed third down information to estimate these conditional success rates. We then use these estimates and the model to determine what the success rates would have been if the coach had decided to go for it on fourth down.
We can determine the “rationality” of NFL coaches by comparing the predicted rate of going for it against the actual rate. If coaches are rational, then the predicted rate of going for it should not be terribly different from the actual rate of going for it. Of course, this is based on the enormous assumption that the econometrician knows more about NFL than an NFL coach! Or at least, the model used here is a reasonable facsimile of third down and fourth down situations.
NFL Data
In order to estimate the parameters of the game we need to estimate the success rates, the Offense’s strategy, and the value functions. The next play listed in the data is assumed to be the next play that occurs in the game. We will use the next play to determine the “result” of the play.
<- read.csv("NFLQ1.csv", as.is = TRUE)
x $id <- as.numeric(row.names(x))
x$res_pos <- c(x$posteam[2:dim(x)[1]],NA)
x$res_ep <- c(x$ep[2:dim(x)[1]],NA)
x$res_ep <- ifelse(x$posteam==x$res_pos,x$res_ep,-x$res_ep)
x$res_game <- c(x$game_id[2:dim(x)[1]],NA)
x$res_down <- c(x$down[2:dim(x)[1]],NA)
x$diff_ep <- x$res_ep - x$ep
x$diff_ep <- ifelse(x$game_id==x$res_game,x$diff_ep,NA)
x$year <- sub("-.*","",x$game_date)
x# this uses "real expressions."
# it subs out everything after the "-".
$succ <- NA
x$succ <- ifelse(x$res_down==1 & x$posteam==x$res_pos &
x$down==3,1,x$succ)
x$succ <- ifelse(x$res_down==4,0,x$succ)
x$pct_field <- x$yardline_100/100
x$year <- as.numeric(x$year) x
Estimating Third Down Game in R
These expected points estimators are used by the Nash equilibrium condition (Equation 4) to determine the Defense’s strategy.
<- lm(diff_ep ~ ydstogo + pct_field + year,
lm_run data=x[x$play_type=="run" &
$down==3 & x$succ==1,])
x<- lm(diff_ep ~ ydstogo + pct_field + year,
lm_pass data=x[x$play_type=="pass" &
$down==3 & x$succ==1,])
x<- lm(diff_ep ~ ydstogo + pct_field + year,
lm_punt data=x[x$play_type=="punt",])
The next step is to determine the expected points after each action that the Offense could take; pass, run or punt.3 The first regression is on the change in expected points after a third down run that gets a first down. Table 4 presents the results from the three OLS regressions on difference in expected points. The results show that successful pass plays get much larger increases in expected points than successful run plays. As expected these effects are large the further to go to first down and the further to go to goal.
(1) | (2) | (3) | |
---|---|---|---|
(Intercept) | -8.146 | 10.331 | -63.722 |
(10.095) | (7.675) | (10.314) | |
ydstogo | 0.220 | 0.153 | 0.019 |
(0.005) | (0.003) | (0.003) | |
pct_field | 0.493 | 1.808 | 1.874 |
(0.069) | (0.053) | (0.102) | |
year | 0.004 | -0.005 | 0.031 |
(0.005) | (0.004) | (0.005) | |
Num.Obs. | 1411 | 3896 | 5860 |
R2 | 0.604 | 0.503 | 0.076 |
R2 Adj. | 0.603 | 0.503 | 0.075 |
AIC | 2240.5 | 7783.5 | 17861.1 |
BIC | 2266.8 | 7814.9 | 17894.4 |
Log.Lik. | -1115.269 | -3886.762 | -8925.532 |
RMSE | 0.53 | 0.66 | 1.11 |
Given these results we can determine the predicted expected points for a run, a pass, and a punt. Note that these are calculated for every play in the data. Later, we will only use this information on the relevant subset.
<- as.matrix(cbind(1,x$ydstogo,x$pct_field,x$year))
X $pass_ep <- x$ep + X%*%lm_pass$coefficients
x$run_ep <- x$ep + X%*%lm_run$coefficients
x$punt_ep <- x$ep + X%*%lm_punt$coefficients x
We canuse GMM to estimate the parameters from the Third Down game. This is done for “to go” distances of 1 to 4 yards and position from the team’s own 30 yard line to the opponent’s 40 yard line (30 yards). Given the sparsity of data we include data from ten yards either side of the position of interest and 1 yard either side of the to-go distance.4
<- matrix(NA,4*30,4)
theta_hat <- 1
k for (i in 1:4) {
<- i # to go distance.
tg for (j in 1:30) {
<- 29 + j # yardline
yl # create subset for analysis
<- x$down==3 &
index $yardline_100 > yl & x$yardline_100 < (yl+20) &
x$ydstogo > tg - 2 & x$ydstogo < tg + 2 &
x$play_type=="pass" | x$play_type=="run")
(x<- x[index,]
y <-
index_na is.na(rowSums(cbind(y$run_ep,y$pass_ep,y$punt_ep,
$succ,
y$play_type=="pass" |
(y$play_type=="run"))))==0
y# GMM to determine parameters (conditional probabilities)
<- y[index_na,]
y_na <- optim(par=rnorm(4,mean=0,sd=2),fn=p3_fun,
a1 Ep=cbind(y_na$run_ep,y_na$pass_ep,
$punt_ep),
y_nas=y_na$succ,play=y_na$play_type,
control = list(maxit=10000))
<- exp(a1$par)/(1 + exp(a1$par))
theta_hat[k,] <- k + 1
k #print(k)
} }
Predicting the Fourth Down Game
We can use the parameter estimates from the third down game to model the fourth down game. The purpose is to model the policy of going for it at various positions on the field and to go distances. The model assumes that both Offense and Defense play a mixed strategy Nash equilibrium.
The game is almost identical to the third down game (Table 2); the difference is the payoffs from the unsuccessful play. In the third down game, the payoff is the expected points from punting the ball away. Here it is the expected points from turning the ball over on downs. That is, the expected points if the ball goes to the other team and that team gets to start where the ball is currently located. The question is whether this one change to the payoffs substantially changes the Nash equilibrium and the predicted outcome of the game. Note that other expected point calculations and the conditional probabilities are all the same as for third down, with the conditional probabilities determined from the GMM procedure.
<- lm(ep ~ pct_field + I(pct_field^2),
lm_over data=x[x$down==1,])
$over_ep <- -cbind(1,(1-x$pct_field),
x1-x$pct_field)^2)%*%lm_over$coefficients (
To estimate the expected points from a turnover on downs, we use the expected points at first down from various distances to goal. The OLS regression assumes that there is a non-linear relationship.
<- function(theta,Ep) {
p4_fun <- q_fun(theta,Ep)
q4 <- q4$qo*(theta[3]*(1 - q4$qd) + theta[4]*q4$qd)
p4p <- (1 - q4$qo)*(theta[1]*(1 - q4$qd) + theta[2]*q4$qd)
p4r <- p4r + p4p
p4 <- p4r*Ep[,1] + p4p*Ep[,2] + (1 - p4)*Ep[,3]
Epgfi # expected points going for it.
return(list(p4=p4,Epgfi=Epgfi))
}
<- prob_actual <- prob_pred <- matrix(NA,4,30)
tab_res <- 1
k for (i in 1:4) {
<- i # to go distance.
tg for (j in 1:30) {
<- 29 + j # yardline
yl # create subset for analysis
<- x$down==3 &
index3 $yardline_100 > yl & x$yardline_100 < (yl+20) &
x$ydstogo > tg - 2 & x$ydstogo < tg + 2 &
x$play_type=="pass" | x$play_type=="run")
(x<- x[index3,]
y # determine predicted success on 4th down.
<- p4_fun(theta_hat[k,],
succ4 cbind(y$run_ep,y$pass_ep,y$over_ep))
# Actual frequency of going for it on 4th down.
<- x$down==4 &
index4 $yardline_100 > yl & x$yardline_100 < (yl+20) &
x$ydstogo > tg - 2 & x$ydstogo < tg + 2
x<- x[index4,]
z $go <- ifelse(z$play_type=="run" |
z$play_type=="pass",1,NA)
z$go <- ifelse(z$play_type=="punt",0,z$go)
z# relative value of punting
<- mean(y$punt_ep - succ4$Epgfi,
tab_res[i,j] na.rm = TRUE)
# predicted probability of going for it
<- mean(y$punt_ep - succ4$Epgfi < 0,
prob_pred[i,j] na.rm = TRUE)
# actual probability of going for it.
<- mean(z$go,na.rm = TRUE)
prob_actual[i,j] <- k + 1
k #print(k)
} }
We can estimate the model at different distances to goal and yards to go to determine whether it is better to punt or “go for it” on fourth down.
1 To Go | 2 To Go | 3 To Go | 4 To Go |
---|---|---|---|
Min. :0.1991 | Min. :0.1166 | Min. :0.4313 | Min. :1.475 |
1st Qu.:0.5708 | 1st Qu.:0.5928 | 1st Qu.:1.0888 | 1st Qu.:1.551 |
Median :0.7780 | Median :0.8960 | Median :1.3094 | Median :1.636 |
Mean :1.0160 | Mean :1.1147 | Mean :1.7300 | Mean :1.969 |
3rd Qu.:1.0475 | 3rd Qu.:1.4337 | 3rd Qu.:1.6666 | 3rd Qu.:1.740 |
Max. :3.0715 | Max. :3.4444 | Max. :4.4630 | Max. :4.065 |
The table suggests that is almost always better to punt the ball away on fourth down. Table 5 presents summary statistics on the difference between expected points from punting over going for it, at each to go distance from 1 yard to 4 yards.
Testing Rationality of NFL Coaches

The Figure 2 presents a histogram of the difference between the actual and predicted probability of going for it at each yard line and to go distance. It does not provide strong evidence of irrationality. In most cases the probabilities are the same or similar. However, there are cases where the model makes a strong prediction to go for it, but no NFL coach does. There are also cases where many NFL coaches do in fact go for it but the model does not predict that they should.
Discussion and Further Reading
The chapter uses game theory to solve the problem of estimating the value of punting the ball on 4th Down in American football. The estimation problem follows from the fact that coaches rarely go for it on 4th Down, and so there is no data. Using game theory and the GMM estimator, we can estimate the policy parameters using data from third downs. We can then use these estimates and the mixed strategy Nash equilibrium to simulate 4th Down. Our analysis suggests that NFL coaches punt the ball the appropriate number of times on 4th Down.
While GMM has become a standard technique, recent work suggests using moment inequalities generated from decision problems or games (Pakes et al. 2015)
References
Footnotes
The version of the data used here is available at https://sites.google.com/view/microeconometricswithr/table-of-contents↩︎
Variance of a random variable \(x\) is \(\mathbb{E}((x - \mu)^2)\), where \(\mathbb{E}(x) = \mu\).↩︎
For simplicity we won’t consider options to kick a field goal.↩︎
You should try different values.↩︎