R Code Sample

This is a sample of kind of code I wrote regularly as part of my MicroMasters in Data, Economics, and Development Policy, a program offered online by Masachussetts Institute of Technology and Abdul Latif Jameel Poverty Action Lab (JPAL).

This sample has been developed using Rmarkdown and uses partial/modified data which was provided during the course. The dataset was adapted from the dataset used in the following paper:

Acemoglu, Daron, Simon Johnson, and James A. Robinson. “The Colonial Origins of Comparative Development: An Empirical Investigation.” The American Economic Review 91, no. 5 (2001): 1369-401. Accessed July 17, 2021. http://www.jstor.org/stable/2677930.

Getting and Loading Data

download.file("courses.edx.org/assets/courseware/v1/11ed1a4b6ae7a89062c2e510d8c42997/asset-v1:MITx+14.750x+2T2020+type@asset+block/AJRData.RData", destfile = "AJRData.RData")

load("AJRData.RData") #since file is RData file, readr is not required. 

library(tidyverse)
AJRData <- as_tibble(AJRData)

#checking data class 
class(AJRData)
print(AJRData, n = 10)
## # A tibble: 163 x 10
##    shortnam africa lat_abst rich4 avexpr logpgp95 logem4  asia loghjypl baseco
##    <chr>     <dbl>    <dbl> <dbl>  <dbl>    <dbl>  <dbl> <dbl>    <dbl>  <dbl>
##  1 AFG           0   0.367      0  NA       NA      4.54     1   NA         NA
##  2 AGO           1   0.137      0   5.36     7.77   5.63     0   -3.41       1
##  3 ARE           0   0.267      0   7.18     9.80  NA        1   NA         NA
##  4 ARG           0   0.378      0   6.39     9.13   4.23     0   -0.872      1
##  5 ARM           0   0.444      0  NA        7.68  NA        1   NA         NA
##  6 AUS           0   0.300      1   9.32     9.90   2.15     0   -0.171      1
##  7 AUT           0   0.524      0   9.73     9.97  NA        0   -0.344     NA
##  8 AZE           0   0.448      0  NA        7.31  NA        1   NA         NA
##  9 BDI           1   0.0367     0  NA        6.57   5.63     0   -3.51      NA
## 10 BEL           0   0.561      0   9.68     9.99  NA        0   -0.179     NA
## # … with 153 more rows

Background and Variable Guide

Paper Abstract

“We exploit differences in European mortality rates to estimate the effect of institutions on economic performance. Europeans adopted very different colonization policies in different colonies, with different associated institutions. In places where Europeans faced high mortality rates, they could not settle and were more likely to set up extractive institutions. These institutions persisted to the present. Exploiting differences in European mortality rates as an instrument for current institutions, we estimate large effects of institutions on income per capita. Once the effect of institutions is controlled for, countries in Africa or those closer to the equator do not have lower incomes.”

Variable Guide

  1. shortnam: country shorthand
  2. africa: dummy variable for whether or not country is in Africa
  3. lat_abst: unknown
  4. rich4: dummy for whether country is rich or poor 
  5. avexpr: protection against expropriation risk (the protection against “risk of expropriation” index from Political Risk Services) as a proxy for institutions)
  6. logpgp95: log GDP per capita in 1995
  7. logem4 settler mortality rates per thousand
  8. asia: dummy variable for whether or not country is in Asia
  9. loghjypl: unkown
  10. baseco: dummy for whether or not data is part of sample used in paper (“base sample”)

Exclusion restriction: “conditional on the controls included in the regression, the mortality rates of European settlers more than 100 years ago have no effect on GDP per capita today, other than their effect through institutional development.”

Subsetting data to match base sample

Subsetting data to match base sample, as done in the paper.

AJRDataBS <- filter(AJRData, baseco == "1")

Summarising and OLS Regression

Summarising

Summarizing data to see average protection against expropriation risk and average GDP in Base Sample, Asia, Africa and Other Continants. The last requires the creation of a new dummy variable.

avgs_BS <- AJRDataBS %>% 
  summarize(avgExpR = mean(avexpr), avgGDP = mean(logpgp95)) %>%
  arrange(desc(avgGDP))
print(avgs_BS)
## # A tibble: 1 x 2
##   avgExpR avgGDP
##     <dbl>  <dbl>
## 1    6.52   8.06
avgs_region <-AJRDataBS %>% 
  mutate(othercontinant = ifelse(asia | africa == "1", 0, 1)) %>%
  group_by(africa, asia, othercontinant) %>% 
  summarize(avgExpR = mean(avexpr), avgGDP = mean(logpgp95)) %>%
  arrange(desc(avgGDP))
print(avgs_region)
## # A tibble: 3 x 5
## # Groups:   africa, asia [3]
##   africa  asia othercontinant avgExpR avgGDP
##    <dbl> <dbl>          <dbl>   <dbl>  <dbl>
## 1      0     0              1    6.91   8.72
## 2      0     1              0    7.21   8.19
## 3      1     0              0    5.88   7.34

The correlation between institutions and economic performance.

Plotting the logarithm of GDP per capita today against the protection against expropriation risk for a sample of 64 countries.

esta <- lm(logpgp95~avexpr, data = AJRDataBS)
summary(esta)
## 
## Call:
## lm(formula = logpgp95 ~ avexpr, data = AJRDataBS)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.8715 -0.4644  0.1683  0.4610  1.1413 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.66038    0.40851  11.408  < 2e-16 ***
## avexpr       0.52211    0.06119   8.533 4.72e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7132 on 62 degrees of freedom
## Multiple R-squared:  0.5401, Adjusted R-squared:  0.5327 
## F-statistic: 72.82 on 1 and 62 DF,  p-value: 4.724e-12
ggplot(AJRDataBS, aes(avexpr,logpgp95)) + 
  geom_point(shape=1) + 
  geom_smooth(method=lm,se=FALSE) + 
  labs(x = "Protection from Expropriation Risk", y = "Log GDP per Capita, 1995")

IV Regression

\[ Settler Mortality \to Institutions \to Economic Performance. \]

library(AER)
estd <- ivreg(logpgp95~avexpr | logem4, data = AJRDataBS)
summary(estd)
## 
## Call:
## ivreg(formula = logpgp95 ~ avexpr | logem4, data = AJRDataBS)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.44903 -0.56242  0.07311  0.69564  1.71752 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   1.9097     1.0267   1.860   0.0676 .  
## avexpr        0.9443     0.1565   6.033  9.8e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9483 on 62 degrees of freedom
## Multiple R-Squared: 0.187,   Adjusted R-squared: 0.1739 
## Wald test: 36.39 on 1 and 62 DF,  p-value: 9.799e-08

Crosschecking Results with 2-stage OLS

  1. First Stage

Plotting the the protection against expropriation risk against the logarithm of the settler mortality rates per thousand for a sample of 64 countries.

estb <- lm(avexpr~logem4, data = AJRDataBS)
summary(estb)
## 
## Call:
## lm(formula = avexpr ~ logem4, data = AJRDataBS)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.6606 -0.9922  0.0280  0.8266  3.3566 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   9.3414     0.6107   15.30  < 2e-16 ***
## logem4       -0.6068     0.1267   -4.79 1.08e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.265 on 62 degrees of freedom
## Multiple R-squared:  0.2701, Adjusted R-squared:  0.2584 
## F-statistic: 22.95 on 1 and 62 DF,  p-value: 1.077e-05
  1. Reduced Form

Plotting the logarithm of GDP per capita today against the logarithm of the settler mortality rates per thousand for a sample of 64 countries.

estc <- lm(logpgp95~logem4, data = AJRDataBS)
summary(estc)
## 
## Call:
## lm(formula = logpgp95 ~ logem4, data = AJRDataBS)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.7545 -0.5386  0.1412  0.4607  1.4059 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 10.73057    0.36718  29.224  < 2e-16 ***
## logem4      -0.57297    0.07616  -7.523 2.66e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7604 on 62 degrees of freedom
## Multiple R-squared:  0.4772, Adjusted R-squared:  0.4688 
## F-statistic:  56.6 on 1 and 62 DF,  p-value: 2.659e-10
  1. Wald Estimate
waldest <-  (-0.573)/(-0.6068)
print(waldest, digits = 4)
## [1] 0.9443

Session Info

sessionInfo()
## R version 4.0.2 (2020-06-22)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Catalina 10.15.7
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] AER_1.2-9       survival_3.1-12 sandwich_2.5-1  lmtest_0.9-37  
##  [5] zoo_1.8-8       car_3.0-11      carData_3.0-4   forcats_0.5.1  
##  [9] stringr_1.4.0   dplyr_1.0.7     purrr_0.3.4     readr_1.4.0    
## [13] tidyr_1.1.3     tibble_3.1.2    ggplot2_3.3.5   tidyverse_1.3.1
## 
## loaded via a namespace (and not attached):
##  [1] httr_1.4.2        jsonlite_1.7.2    splines_4.0.2     modelr_0.1.8     
##  [5] Formula_1.2-3     assertthat_0.2.1  highr_0.9         cellranger_1.1.0 
##  [9] yaml_2.2.1        pillar_1.6.1      backports_1.2.1   lattice_0.20-41  
## [13] glue_1.4.2        digest_0.6.25     rvest_1.0.0       colorspace_2.0-2 
## [17] htmltools_0.5.1.1 Matrix_1.2-18     pkgconfig_2.0.3   broom_0.7.8      
## [21] haven_2.4.1       scales_1.1.1      openxlsx_4.2.4    rio_0.5.27       
## [25] mgcv_1.8-31       generics_0.0.2    farver_2.1.0      ellipsis_0.3.2   
## [29] withr_2.4.1       cli_3.0.0         magrittr_2.0.1    crayon_1.4.1     
## [33] readxl_1.3.1      evaluate_0.14     fs_1.5.0          fansi_0.4.1      
## [37] nlme_3.1-148      xml2_1.3.2        foreign_0.8-80    tools_4.0.2      
## [41] data.table_1.14.0 hms_1.1.0         lifecycle_1.0.0   munsell_0.5.0    
## [45] reprex_2.0.0      zip_2.2.0         compiler_4.0.2    rlang_0.4.10     
## [49] grid_4.0.2        rstudioapi_0.13   labeling_0.4.2    rmarkdown_2.9    
## [53] gtable_0.3.0      abind_1.4-5       DBI_1.1.1         curl_4.3         
## [57] R6_2.4.1          lubridate_1.7.10  knitr_1.33        utf8_1.1.4       
## [61] stringi_1.5.3     Rcpp_1.0.7        vctrs_0.3.8       dbplyr_2.1.1     
## [65] tidyselect_1.1.0  xfun_0.24