Introduction

The Intern

You have been hired as a summer intern for a right-of-center think tank in Washington DC. It is going to be a great summer! You will play softball on the Mall. Go to Nats games. Hang out with friends interning on the Hill. And melt onto the sidewalk when you commute to work.

First Day

The think tank is arguing against a federal increase in the minimum wage. You have been asked to predict what will happen if the minimum wage increases from $7.25 to $15.00.

You have a vague memory of a discussion of the minimum wage in your Econ 101 class. To refresh your memory you google “minimum wage Khan Academy.” You listen to Sol explain that the equilibrium wage is $6 per hour and workers work 22 million hours per month. Sol shows that a minimum wage of $7 leads to 2 million hours of unemployment and $1 million of output per month lost in the economy. This seems straightforward.

But what actually happens in the real world? Your supervisor suggests looking up minimum wages for each state and state level unemployment levels from the Bureau of Labor Statistics (https://bls.gov). She says that different states have changed their minimum wage over time and a number of states have minimum wages that are above $7.25, although none as high as $15.

You download the data on each state’s current minimum wage and unemployment rate. You put everything in a spreadsheet. A fellow intern shows you how to save it as a csv file. He says this will allow importing the data into R, which is the statistical language of choice at your think tank.

You then download R and RStudio (the IDE you are told, whatever that is). Your colleague shows you how to get set up. He shows you how to open up RStudio and then create a new script file. You call the file minwage.R and save it to the minimum wage folder where you have the data set. He then tells you to go to Session > Set Working Directory > To Source File Location. “Trust me. It makes coding easier,” he says.

Now you are ready to write your first line of code.

x <- read.csv("minimum wage.csv", as.is = TRUE)

Your colleague explains that read.csv will import the data set that you created. The data set is simply called x. He explains that you must use the assign <- arrow. You ask why. He shrugs, “that was what I was told when I started.” Also, he says you should probably write as.is = TRUE because R has a habit of changing numbers to characters and other, even stranger, things.

You click Run. It worked! The letter x appears in the Global Environment. You click on it. A tab with your data appears.

You want to calculate the relationship between the minimum wage and unemployment. You want to run a regression.¹ You ask your cubicle neighbor how to do that. She tells you to write the following.

lm1 <- lm(x$Unemployment.Rate ~ x$Minimum.Wage)

You ask about the whole thing with <-. Your neighbor says that you must do it that way but refuses to explain why.

You write out the code and hit Run. Nothing happens. Actually, lm1 appears in the box in the upper right of the screen. Apparently it is a List of 12. You were hoping to see a table with regression results and t-statistics. But nothing. You ask for help from your neighbor. She rolls her eyes. “No. You just created an object called lm1. To look at it, use summary.”

summary(lm1)[4]

$coefficients
                 Estimate Std. Error    t value    Pr(>|t|)
(Intercept)    3.49960275 0.32494078 10.7699709 1.62017e-14
x$Minimum.Wage 0.01743224 0.03733321  0.4669365 6.42615e-01

Cool. You got what you were looking for. The minimum wage increases unemployment! It increases it by 0.01743. You wonder what that means. Another intern comes by, looks at what you did, and then types the following code on your computer. He leaves with a “you’re welcome.” Boy, is that guy annoying.

a1 <- (15-7.25)*lm1$coefficients[2]
a1/mean(x$Unemployment.Rate)

x$Minimum.Wage 
    0.03710335

Your neighbor explains it all. You want to know what happens to unemployment when the minimum wage increase from $7.25 to $15. The second coefficient states that amount. Then you can put it in percentage terms relative to the current unemployment rate.

You go back to your supervisor. You say you found that a minimum wage increase to $15 would increase the unemployment rate by four percent. “The unemployment rate would go to 8%!” she exclaims. No, no, no. You clarify that it would increase by 4 percent not 4 percentage points. From say 4% to 4.15%. “Oh. Still that is a big increase.” Then she says, “But are you sure? How can you tell what will happen in states that haven’t changed their minimum wage?” You respond accurately. “I don’t know.” As you ponder this, you notice everyone is getting dressed for softball.

Second Day

On your way into the building you run into your supervisor. You explain how you were able to beat CBO’s Dynamic Scorers by just one run. She congratulates you and says, “You should plot the relationship between the minimum wage and unemployment.”

After some coffee and some googling you find code to do what your supervisor suggested.

plot(x$Minimum.Wage, x$Unemployment.Rate)

The annoying guy from yesterday breezes by and says. “Oh no. Looks like it is censored. You will need to use a Tobit.” At this, someone a few cubicles away pops up like a meerkat. He says, “No. Don’t use a Tobit, use a Heckit. The data is probably selected.” Then he is gone.

What the heck is a Heckit? What the heck is a Tobit? What the heck is a meerkat?

The Book

The book is designed to help our fictitious intern hero survive the summer in DC.

What Does it Cover?

The book is based on a course I have taught at Johns Hopkins University as part of their Masters of Applied Economics. The book and the course aim to provide an introduction to applied microeconometrics. The goal is for the reader to have competence in using standard tools of microeconometrics including ordinary least squares (OLS), instrumental variables (IV), probits and logits. Not only should you be able to understand how these models work, but more importantly, you should be able to understand when they don’t.

In addition to these standard models, the book and the course introduce important models commonly used in microeconometrics. These include the Tobit and the Heckman selection model (Heckit). The book introduces approaches that have become common in my subfield of empirical industrial organization. These approaches center around the analysis of games. That is, situations where a small number of individuals or firms interact strategically.

Lastly, the book introduces some new techniques that have been developed to analyze panel data models and other situations where an object of interest is measured repeatedly. It discusses difference in difference as well as stochastic controls. It touches on how machine learning techniques can be applied in microeconometrics. The book also introduces to a broader economic audience ideas of Harvard statistician, Herbert Robbins, called empirical Bayesian analysis.

What is the Approach?

The book teaches microeconometrics through R. It is not primarily aimed at teaching R. Rather, it is primarily aimed at teaching microeconometrics. This idea of using computer programming as a tool of instruction goes back to at least Seymour Papert and MIT’s AI lab in the 1970s.² South African-born Papert helped develop a programming language called Logo. The goal of Logo was to teach mathematics by programming how a turtle moves around the screen. You may have used one of the offspring of Logo, such as Scratch or Lego Mindstorms.

I learned math through Logo. When I was a pre-teen, Logo became available on the personal computer, the Apple II. My parents taught me to program in Logo and I learned a number of geometric concepts such as Euclidean distance and the Pythagorean theorem by programming up a model of how a moth circled a light source.

The book uses Papert’s ideas to teach microeconometrics. You will learn the math of the estimator and then how to program up that estimator.³ The book makes particular use of the computer’s ability to simulate data. This allows us to compare our estimates to what we know to be the true values. In some cases these simulations illustrate that our estimator is correct; in others, the simulations help us to understand why our estimator is incorrect. Testing models on simulated data has the additional benefit of allowing you to check your programming.

The book is written in RStudio using Sweave. Sweave allows to be integrated into R. is a free type-setting language that is designed for writing math. Almost all the code that is used in the book is actually presented in the book. On occasion it is more practical to create a data set outside the book. In those cases, the data and the code that created the data are available here https://sites.google.com/view/microeconometricswithr/table-of-contents. In a couple of other cases, the preferred code does not produce nice output for the book. I have highlighted those cases in the text. I also generally hide repetitive code. For the most part, the coding in the book is in base R . The book makes little use of packages. This shows you the underlying code and illustrates the econometrics. That said, there are a few packages that I really like, including stargazer and xtable which both make nice tables in .

What are POs, DAGs, and Do Operators?

POs, DAGs and Do Operators sound like school-yard put-downs, but they form the core of the book’s approach to econometrics. This approach is heavily influenced by Northwestern econometrician, Charles Manski. I was lucky enough to have Chuck teach me econometrics in my first year of graduate school. It was a mind-altering experience. I had taken a number of econometrics classes as an undergraduate. I thought much of it was bunk. Manski said much of what you learned in standard econometrics classes was bunk. Manski gave meat to the bones of my queasiness with econometrics.

The book focuses on the question of identification. Does the algorithm estimate the parameter we want to know? Other books spend an inordinate amount of time on the accuracy of the parameter estimate or the best procedure for calculating the estimate. This book steps back and ask whether the procedure works at all. Can the data even answer the question? This seems to be fundamental to econometrics, yet is given short shrift in many presentations.

The book focuses on identifying the causal effect. What happens to the outcome of interest when the policy changes? What happens if college becomes free? What happens if prices are increased? What happens if the federal minimum wage is increased?

To answer these causal questions, the book uses directed acyclic graphs (DAG) and do operators. The book, particularly the early chapters, relies on ideas of Israeli and UCLA computer scientist, Judea Pearl. DAGs help us to understand whether the parameter of interest can be identified from the data we have available. These diagrams can be very useful models of the data generating process. Hopefully, it will be clear that DAGs are models, and as such, they highlight some important issues while suppressing others.

Pearl’s do operator helps illustrate the old statistical chestnut, “correlation is not causality.” Observing the unemployment rate for two different minimum wage laws in two different states is quite different from changing the minimum wage law for one state. In the first instance we observe a statistical quantity, the unemployment rate conditional on the minimum wage law. In the second case we are making a prediction, what will happen to the unemployment rate if the law is changed?

In some cases it is useful to illustrate these issues using the potential outcome (PO) model of former Harvard statistician, Donald Rubin. This model highlights the fundamental identification problem of statistics. We can never observe the difference in state unemployment rates for two different minimum wage laws. Sure, we can observe unemployment rates for two different states with different minimum wage laws. We can even observe the difference in unemployment rates for the same state before and after a change in the minimum wage law. However, we cannot observe the unemployment rate for the same state at the same time with two different minimum wage laws.

In addition, the PO model illustrates that causal effect is not single valued. A policy that encourages more people to attend college may allow many people to earn higher incomes, but it may not help all people. It is even possible that some people are made worse off by the policy. There is a distribution of causal effects.

What About the Real World?

The course I have taught at Hopkins is for a Masters of Applied Economics. I take the applied part of this seriously. The course and this book aim to show how to do microeconometrics. I have spent my career using data to answer actual policy questions. Did a realtor group’s policies lead to higher prices for housing transactions?⁴ Did Google’s changes to the search results page harm competitors or help consumers?

The book presents interesting and important questions. One of the most important is measuring “returns to schooling.” What is the causal effect on income of having one more year of school? It is easy to see that people with college degrees earn more than those with high school diplomas. It is much harder to determine if a policy that encourages someone to finish college actually leads that person to earn more money. I throw lots of data, economic theory and statistical techniques at this question. Hopefully, by the end you will see how analysis of survey data with OLS, IV, Heckman selection and GMM models helps us answer this question. You will also see how mixture models can be used to analyze comparisons of twins.

The book discusses important questions beyond returns to schooling. It discusses racism in mortgage lending. It discusses gender bias in labor market earnings. It discusses increasing the federal minimum wage. It discusses the effect of guns on crime. It even discusses punting on fourth down. I hope the book points you to new questions and new data to answer existing questions.

The book does not recommend policies. The government economist and founding Director of the Congressional Budget Office, Alice Rivlin, argued that it is extremely important to provide policy makers with objective analysis. In a memo to staff she said the following.⁵

We are not to be advocates. As private citizens, we are entitled to our own views on the issues of the day, but as members of CBO, we are not to make recommendations, or characterize, even by implication, particular policy questions as good or bad, wise or unwise.

Economists in government, the private sector and the academy work on important policy questions. I believe that economists are most effective when they do not advocate for policy positions, but present objective analysis of the economics and the data. I hope that this book presents objective analysis of interesting policy questions and you have no idea whether I think particular policy positions are good or bad, wise or unwise.⁶

The Outline

The book’s twelve chapters are broken into three parts based on the main approach to identification. The first part presents methods that rely on the existence of an experiment. This part includes chapters covering ordinary least squares (OLS), instrumental variables (IV), randomized controlled trials (RCTs) and Manski bounds. The second part presents methods that rely on economic theory to identify parameters of interest. This is often referred to as a structural approach. These chapters discuss demand models and discrete estimators such as logits and probits, censored and selection models, non-parametric auction models and generalized method of moments (GMM). The third part presents methods that rely on the existence of repeated measurement in the data. These methods include difference in difference, fixed effects, synthetic controls and factor models.

Experiments

The first four chapters rely on experiments, broadly construed, to identify the causal effects of policies.

Chapter 1 introduces the work-horse algorithm of economics, ordinary least squares (OLS). This model is simple and quick to estimate and often produces reasonable results. The chapter illustrates how the model is able to disentangle the effects on the outcome of interest. OLS relies on strong assumptions. In particular, the model assumes that the policy variable of interest affects the outcome independently of any unobserved term.

Chapter 2 considers how additional observed characteristics improve our estimates. It shows when adding more control variables improves the estimation and when it produces garbage. The chapter discusses the problem of multicollinearity. It discusses an alternative to the standard approach based on the work of Judea Pearl. The chapter replicates the OLS model used in Card (1995) to estimate returns to schooling. The chapter uses a DAG and Pearl’s approach to help determine whether there exists evidence of systematic racism in mortgage lending.

Chapter 3 introduces the instrumental variables model. This model allows the independence assumption to be weakened. The model allows the policy variable to be affected by unobserved characteristics that also determine the outcome. The chapter presents IV estimates of returns to schooling by replicating Card (1995). DAGs are used to illustrate and test the assumptions. The Local Average Treatment Effect (LATE) is proposed as an estimator when the researcher is unwilling to assume the treatment effects each person identically.

Chapter 4 considers formal experiments. The ideal randomized controlled trial allows the researcher to estimate the average effect of the policy variable. It also allows the researcher to bound the distribution of effects using Kolmogorov bounds. The method is used to bound the effect of commitment savings devices on increasing or decreasing savings. The chapter presents Manski’s natural bounds and discusses inference when the data does not come from ideal randomized controlled trials. It considers the problem of estimating the causal effect of guns on crime using variations in state gun laws.

Structural Estimation

The first four chapters consider questions and issues relevant to economics, but describe standard estimation methods. Chapters 5 to 9 use economic theory directly in the estimation methods.

Chapter 5 introduces revealed preference. The chapter shows how this idea is used to infer unobserved characteristics of individual economic actors. Berkeley econometrician, Dan McFadden, pioneered the idea of using economic theory in his analysis of how people would use the new (at the time) Bay Area Rapid Transit (BART) system. This chapter introduces standard tools of demand analysis including the logit and probit models. It takes these tools to the question of whether smaller US cities should invest in urban rail infrastructure.

Chapter 6 also uses revealed preference, but this time to analyze labor markets. Chicago’s Jim Heckman shared the Nobel prize with Dan McFadden for their work on revealed preference. In McFadden’s model you, the econometrician, do not observe the outcome from any choice, just the choice that was made. In Heckman’s model you observe the outcome from the choice that was made, but not the outcome from the alternative. The chapter describes the related concepts of censoring and selection, as well as their model counterparts the Tobit and Heckit. The section uses these tools to analyze gender differences in wages and returns to schooling.

Chapter 7 returns to the question of estimating demand. This time it allows the price to be determined as the outcome of market interactions by a small number of firms. This chapter considers the modern approach to demand analysis developed by Yale economist, Steve Berry. This approach combines game theory with IV estimation. The estimator is used to determine the value of Apple Cinnamon Cheerios.

Chapter 8 uses game theory and the concept of a mixed strategy Nash equilibrium to reanalyze the work of Berkeley macroeconomist, David Romer. Romer used data on decision making in American football to argue that American football coaches are not rational. In particular, coaches may choose to punt too often on fourth down. Reanalysis finds the choice to punt to be generally in line with the predictions of economic theory. The chapter introduces the generalized method of moments (GMM) estimator developed by the University of Chicago’s Lars Peter Hansen.

Chapter 9 considers the application of game theory to auction models. The book considers the GPV model of first price (sealed-bid) auctions and the Athey-Haile model of second price (English) auctions. GPV refers to the paper by Emmanuel Guerre, Isabelle Perrigne and Quang Vuong, Optimal Nonparametric Estimation of First-Price Auctions published in 2000. The paper promoted the idea that auctions, and structural models more generally, can be estimated in two steps. In the first step, standard statistical methods are used to estimate statistical parameters of the auction. In the second step, economic theory is used to back out the underlying policy parameters. For second-price auctions, the chapter presents Athey and Haile (2002). Stanford’s Susan Athey and Yale’s Phil Haile provide a method for analyzing auctions when only some of the information is available. In particular, they assume that the econometrician only knows the price and the number of bidders. These methods are used to analyze timber auctions and determine whether the US Forestry Service had legitimate concerns about collusion in the 1970s logging industry.

Repeated Measurement

Chapters 10 to 12 consider data with repeated measurement. Repeated measurement has two advantages. First, it allows the same individual to be observed facing two different policies. This suggests that we can measure the effect of the policy as the difference in observed outcomes. Second, repeated measurement allows the econometrician to infer unobserved differences between individuals. We can measure the value of a policy that affects different individuals differently.

Chapter 10 considers panel data models. Over the last 25 years, the difference in difference estimator has become one of the most used techniques in microeconometrics. The chapter covers difference in difference and the standard fixed effects model. These methods are used to analyze the impact of increasing the minimum wage. The chapter replicates David Card and Alan Krueger’s famous work on the impact of increasing the minimum wage in New Jersey on restaurant employment. The chapter also measures the impact of the federal increase in the minimum wage that occurred in the late 2000s. The chapter follows Currie and Fallick (1996) and uses fixed effects and panel data from the National Longitudinal Survey of Youth 1997.

Chapter 11 considers a more modern approach to panel data analysis. Instead of assuming that time has the same effect on everyone, the chapter considers various methods for creating synthetic controls. It introduces the approach of Abadie, Diamond, and Hainmueller (2010) as well as alternative approaches based on regression regularization and convex factor models. It discusses the benefits and costs of these approaches and compares them using NLSY97 to measure the impact of the federal increase in the minimum wage in the late 2000s.

Chapter 12 introduces mixture models. These models are used throughout microeconometrics, but they are particularly popular as a way to solve measurement error issues. The chapter explains how these models work. It shows that they can be identified when the econometrician observes at least two signals of the underlying data process of interest. The idea is illustrated estimating returns to schooling for twins. The chapter returns to the question of the effect of New Jersey’s minimum wage increase on restaurant employment. The mixture model is used to suggest that the minimum wage increase reduced employment for small restaurants, consistent with economic theory.

Technical Appendices

The book has two technical appendices designed to help the reader to go into more depth on some issues that are not the focus of the book.

Appendix A presents statistical issues, including assessing the value of estimators using measures of bias, consistency and accuracy. It presents a discussion of the two main approaches to finding estimators, the classical method and the Bayesian method. It discusses standard classical ideas based on the Central Limit Theorem and a more recent innovation known as bootstrapping. The Bayesian discussion includes both standard ideas and Herbert Robbins’ empirical Bayesian approach. Like the rest of the book, this appendix shows how you can use these ideas but also gives the reader some insight on why you would want to use them. The appendix uses the various approaches to ask whether John Paciorek was better than Babe Ruth.

Appendix B provides more discussion of R and various programming techniques. The appendix discusses how R is optimized for analysis of vectors, and the implications for using loops and optimization. The chapter discusses various objects that are used in R, basic syntax and commands as well as basic programming ideas including if () else, for () and while () loops. The appendix discusses how matrices are handled in R . It also provides a brief introduction to optimization in R .

Notation

As you have seen above, the book uses particular fonts and symbols for various important things. It uses the symbol R to refer to the scripting language. It uses typewriter font to represent code in R . Initial mentions of an important term are in bold face font.

In discussing the data analysis it uses $X$ to refer to some observed characteristic. In general, it uses $X$ to refer to the policy variable of interest, $Y$ to refer to the outcome, and $U$ to refer to the unobserved characteristic. When discussing actual data, it uses $x_i$ to refer to the observed characteristic for some individual $i$. It uses $\mathbf{x}$ to denote a vector of the $x_i$’s. For matrices it uses $\mathbf{X}$ for a matrix and $\mathbf{X}'$ for the matrix transpose. A row of that matrix is $\mathbf{X}_i$ or $\mathbf{X}_i'$ to highlight that it is a row vector. Lastly for parameters of interest it uses Greek letters. For example, $\beta$ generally refers to a vector of parameters, although in some cases it is a single parameter of interest, while $\hat{\beta}$ refers to the estimate of the parameter. An individual parameter is $b$.

Hello R World

To use this book you need to download R and RStudio on your computer. Both are free.

Download R and RStudio

First, download the appropriate version of RStudio here: https://www.rstudio.com/products/rstudio/download/#download. Then you can download the appropriate version of R here: https://cran.rstudio.com/.

Once you have the two programs downloaded and installed, open up RStudio. To open up a script go to File > New File > R Script. You should have four windows: a script window, a console window, a global environment window, and a window with help, plots and other things.

Using the Console

Go to the console window and click on the >. Then type print("Hello R World") and hit enter. Remember to use the quotes. In general, R functions have the same basic syntax, functionname with parentheses, and some input inside the parentheses. Inputs in quotes are treated as text while inputs without quotes are treated as variables.

print("Hello R World")

[1] "Hello R World"

Try something a little more complicated.

a <- "Chris"  # or write your own name
print(paste("Welcome",a,"to R World",sep=" "))

[1] "Welcome Chris to R World"

Here we are creating a variable called a. To define this variable we use the<- symbol which means “assign.” It is possible to use = but that is generally frowned upon. I really don’t know why it is done this way. However, when writing this out it is important to include the appropriate spaces. It should be a <- "Chris" rather than a<-"Chris". Not having the correct spacing can lead to errors in your code. Note that # is used in R to “comment out” lines in codes. R does not read the line following the hash.

In R we can place one function inside another function. The function paste is used to join text and variables together. The input sep = " " is used to place a space between the elements that are being joined together. When placing one function inside another make sure to keep track of all of the parentheses. A common error is to have more or fewer closing parentheses than opening parentheses.

A Basic Script

In the script window name your script. I usually name the file something obvious like Chris.R. You can use your own name unless it is also Chris.

# Chris.R

Note that this line is commented out, so it is does nothing. To actually name your file you need to go to File > Save As and save it to a folder. When I work with data, I save the file to the same folder as the data. I then go to Session > Set Working Directory > To Source File Location. This sets the working directory to be the same as your data. It means that you can read and write to the folder without complex path names.

Now you have the script set up. You can write into it.

# Chris.R   
 
# Import data
x <- read.csv("minimum wage.csv", as.is = TRUE)
# the data can be imported from here: 
# https://sites.google.com/view/microeconometricswithr/
# table-of-contents
 
# Summarize the data
summary(x)

    State            Minimum.Wage    Unemployment.Rate
 Length:51          Min.   : 0.000   Min.   :2.100    
 Class :character   1st Qu.: 7.250   1st Qu.:3.100    
 Mode  :character   Median : 8.500   Median :3.500    
                    Mean   : 8.121   Mean   :3.641    
                    3rd Qu.:10.100   3rd Qu.:4.100    
                    Max.   :14.000   Max.   :6.400

# Run OLS
lm1 <- lm(Unemployment.Rate ~ Minimum.Wage, data = x)
summary(lm1)[4]

$coefficients
               Estimate Std. Error    t value    Pr(>|t|)
(Intercept)  3.49960275 0.32494078 10.7699709 1.62017e-14
Minimum.Wage 0.01743224 0.03733321  0.4669365 6.42615e-01

# the 4th element provides a nice table.

To run this script, you can go to the Run > Run All. The first line imports the data. You can find the data at the associated website for the book. The data has three variables: the State, their June 2019 minimum wage, and their June 2019 unemployment rate. To see these variables, you can run summary(x).

To run a standard regression use the lm() function. I call the object lm1. On the left-hand side of the tilde ($\sim$) is the variable we are trying to explain, Unemployment.Rate, and on the right-hand side is theMinimum.Wage variable. If we use the option data = x, we can just use the variable names in the formula for thelm() function.

The object, lm1, contains all the information about the regression. You can run summary(lm1) to get a standard regression table.⁷

Discussion and Further Reading

The book is a practical guide to microeconometrics. It is not a replacement for a good textbook such as Cameron and Trivedi (2005) or classics like Goldberger (1991) or Greene (2000). The book uses R to teach microeconometrics. It is a complement to other books on teaching R , particularly in the context of econometrics, such as Kleiber and Zeileis (2008). For more discussion of DAGs and do operators, I highly recommend Judea Pearl’s Book of Why \citep{Pearl:2018]. To learn about programming in R , I highly recommend Kabacoff (2011). The book is written by a statistician working in a computer science department. Paarsh and Golyaev (2016) is an excellent companion to anyone starting out doing computer intensive economic research.

References

Abadie, Alberto, Alexis Diamond, and Jens Hainmueller. 2010. “Synthetic control methods for comparative case studies: estimating the effect of California’s tobacco control program.” Journal of the American Statistical Association 105 (490): 493–505.

Athey, Susan, and Philip A. Haile. 2002. “Identification in Standard Auction Models.” Econometrica 70 (6): 2170–40.

Cameron, A. Colin, and Pravin K. Trivedi. 2005. Microconometrics: Methods and Applications. Cambridge University Press.

Card, David. 1995. “Aspects of Labour Market Behavior: Essays in Honour of John Vanderkamp.” In, edited by Louis N. Christofides, E. Kenneth Grant, and Robert Swidinsky, 201–22. University of Toronto Press.

Currie, Janet, and Bruce C. Fallick. 1996. “The minimum wage and the employment of youth: evidence from the NLSY.” Journal of Human Resources 31 (2): 404–24.

Goldberger, Arthur. 1991. A Course in Econometrics. Harvard University Press.

Greene, William. 2000. Econometric Analysis. Fourth. Prentice Hall.

Kabacoff, Robert I. 2011. R in Action. Manning.

Kleiber, Christian, and Achim Zeileis. 2008. Applied Econometrics with R (Use R!). Springer.

Paarsh, Harry J., and Konstantin Golyaev. 2016. A Gentle Introduction to Effective Computing in Quantitative Research: What Every Research Assistant Should Know. MIT Press.

Footnotes

We use the term regression to refer to a function that summarizes the relationship in the data.↩︎
https://el.media.mit.edu/logo-foundation/↩︎
An estimator is the mathematical method by which the relationship in the data is determined.↩︎
http://www.opn.ca6.uscourts.gov/opinions.pdf/11a0084p-06.pdf ↩︎
https://www.cbo.gov/sites/default/files/Public_Policy_Issues_Memo_Rivlin_1976.pdf ↩︎
In almost all cases, you should punt on fourth down.↩︎
Unfortunately, there are some issues with characters, so instead I just present the table of regression results, which is the 4th element of the list created by summary().↩︎