This is a sample of kind of code I wrote regularly as part of my MicroMasters in Data, Economics, and Development Policy, a program offered online by Masachussetts Institute of Technology and Abdul Latif Jameel Poverty Action Lab (JPAL).
This sample has been developed using Rmarkdown and uses partial/modified data which was provided during the course. The dataset was adapted from the dataset used in the following paper:
Acemoglu, Daron, Simon Johnson, and James A. Robinson. “The Colonial Origins of Comparative Development: An Empirical Investigation.” The American Economic Review 91, no. 5 (2001): 1369-401. Accessed July 17, 2021. http://www.jstor.org/stable/2677930.
download.file("courses.edx.org/assets/courseware/v1/11ed1a4b6ae7a89062c2e510d8c42997/asset-v1:MITx+14.750x+2T2020+type@asset+block/AJRData.RData", destfile = "AJRData.RData")
load("AJRData.RData") #since file is RData file, readr is not required.
library(tidyverse)
AJRData <- as_tibble(AJRData)
#checking data class
class(AJRData)
print(AJRData, n = 10)
## # A tibble: 163 x 10
## shortnam africa lat_abst rich4 avexpr logpgp95 logem4 asia loghjypl baseco
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 AFG 0 0.367 0 NA NA 4.54 1 NA NA
## 2 AGO 1 0.137 0 5.36 7.77 5.63 0 -3.41 1
## 3 ARE 0 0.267 0 7.18 9.80 NA 1 NA NA
## 4 ARG 0 0.378 0 6.39 9.13 4.23 0 -0.872 1
## 5 ARM 0 0.444 0 NA 7.68 NA 1 NA NA
## 6 AUS 0 0.300 1 9.32 9.90 2.15 0 -0.171 1
## 7 AUT 0 0.524 0 9.73 9.97 NA 0 -0.344 NA
## 8 AZE 0 0.448 0 NA 7.31 NA 1 NA NA
## 9 BDI 1 0.0367 0 NA 6.57 5.63 0 -3.51 NA
## 10 BEL 0 0.561 0 9.68 9.99 NA 0 -0.179 NA
## # … with 153 more rows
“We exploit differences in European mortality rates to estimate the effect of institutions on economic performance. Europeans adopted very different colonization policies in different colonies, with different associated institutions. In places where Europeans faced high mortality rates, they could not settle and were more likely to set up extractive institutions. These institutions persisted to the present. Exploiting differences in European mortality rates as an instrument for current institutions, we estimate large effects of institutions on income per capita. Once the effect of institutions is controlled for, countries in Africa or those closer to the equator do not have lower incomes.”
Exclusion restriction: “conditional on the controls included in the regression, the mortality rates of European settlers more than 100 years ago have no effect on GDP per capita today, other than their effect through institutional development.”
Subsetting data to match base sample, as done in the paper.
AJRDataBS <- filter(AJRData, baseco == "1")
Summarizing data to see average protection against expropriation risk and average GDP in Base Sample, Asia, Africa and Other Continants. The last requires the creation of a new dummy variable.
avgs_BS <- AJRDataBS %>%
summarize(avgExpR = mean(avexpr), avgGDP = mean(logpgp95)) %>%
arrange(desc(avgGDP))
print(avgs_BS)
## # A tibble: 1 x 2
## avgExpR avgGDP
## <dbl> <dbl>
## 1 6.52 8.06
avgs_region <-AJRDataBS %>%
mutate(othercontinant = ifelse(asia | africa == "1", 0, 1)) %>%
group_by(africa, asia, othercontinant) %>%
summarize(avgExpR = mean(avexpr), avgGDP = mean(logpgp95)) %>%
arrange(desc(avgGDP))
print(avgs_region)
## # A tibble: 3 x 5
## # Groups: africa, asia [3]
## africa asia othercontinant avgExpR avgGDP
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0 0 1 6.91 8.72
## 2 0 1 0 7.21 8.19
## 3 1 0 0 5.88 7.34
Plotting the logarithm of GDP per capita today against the protection against expropriation risk for a sample of 64 countries.
esta <- lm(logpgp95~avexpr, data = AJRDataBS)
summary(esta)
##
## Call:
## lm(formula = logpgp95 ~ avexpr, data = AJRDataBS)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.8715 -0.4644 0.1683 0.4610 1.1413
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.66038 0.40851 11.408 < 2e-16 ***
## avexpr 0.52211 0.06119 8.533 4.72e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7132 on 62 degrees of freedom
## Multiple R-squared: 0.5401, Adjusted R-squared: 0.5327
## F-statistic: 72.82 on 1 and 62 DF, p-value: 4.724e-12
ggplot(AJRDataBS, aes(avexpr,logpgp95)) +
geom_point(shape=1) +
geom_smooth(method=lm,se=FALSE) +
labs(x = "Protection from Expropriation Risk", y = "Log GDP per Capita, 1995")
\[ Settler Mortality \to Institutions \to Economic Performance. \]
library(AER)
estd <- ivreg(logpgp95~avexpr | logem4, data = AJRDataBS)
summary(estd)
##
## Call:
## ivreg(formula = logpgp95 ~ avexpr | logem4, data = AJRDataBS)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.44903 -0.56242 0.07311 0.69564 1.71752
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.9097 1.0267 1.860 0.0676 .
## avexpr 0.9443 0.1565 6.033 9.8e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9483 on 62 degrees of freedom
## Multiple R-Squared: 0.187, Adjusted R-squared: 0.1739
## Wald test: 36.39 on 1 and 62 DF, p-value: 9.799e-08
Plotting the the protection against expropriation risk against the logarithm of the settler mortality rates per thousand for a sample of 64 countries.
estb <- lm(avexpr~logem4, data = AJRDataBS)
summary(estb)
##
## Call:
## lm(formula = avexpr ~ logem4, data = AJRDataBS)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.6606 -0.9922 0.0280 0.8266 3.3566
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.3414 0.6107 15.30 < 2e-16 ***
## logem4 -0.6068 0.1267 -4.79 1.08e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.265 on 62 degrees of freedom
## Multiple R-squared: 0.2701, Adjusted R-squared: 0.2584
## F-statistic: 22.95 on 1 and 62 DF, p-value: 1.077e-05
Plotting the logarithm of GDP per capita today against the logarithm of the settler mortality rates per thousand for a sample of 64 countries.
estc <- lm(logpgp95~logem4, data = AJRDataBS)
summary(estc)
##
## Call:
## lm(formula = logpgp95 ~ logem4, data = AJRDataBS)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.7545 -0.5386 0.1412 0.4607 1.4059
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.73057 0.36718 29.224 < 2e-16 ***
## logem4 -0.57297 0.07616 -7.523 2.66e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7604 on 62 degrees of freedom
## Multiple R-squared: 0.4772, Adjusted R-squared: 0.4688
## F-statistic: 56.6 on 1 and 62 DF, p-value: 2.659e-10
waldest <- (-0.573)/(-0.6068)
print(waldest, digits = 4)
## [1] 0.9443
sessionInfo()
## R version 4.0.2 (2020-06-22)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Catalina 10.15.7
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] AER_1.2-9 survival_3.1-12 sandwich_2.5-1 lmtest_0.9-37
## [5] zoo_1.8-8 car_3.0-11 carData_3.0-4 forcats_0.5.1
## [9] stringr_1.4.0 dplyr_1.0.7 purrr_0.3.4 readr_1.4.0
## [13] tidyr_1.1.3 tibble_3.1.2 ggplot2_3.3.5 tidyverse_1.3.1
##
## loaded via a namespace (and not attached):
## [1] httr_1.4.2 jsonlite_1.7.2 splines_4.0.2 modelr_0.1.8
## [5] Formula_1.2-3 assertthat_0.2.1 highr_0.9 cellranger_1.1.0
## [9] yaml_2.2.1 pillar_1.6.1 backports_1.2.1 lattice_0.20-41
## [13] glue_1.4.2 digest_0.6.25 rvest_1.0.0 colorspace_2.0-2
## [17] htmltools_0.5.1.1 Matrix_1.2-18 pkgconfig_2.0.3 broom_0.7.8
## [21] haven_2.4.1 scales_1.1.1 openxlsx_4.2.4 rio_0.5.27
## [25] mgcv_1.8-31 generics_0.0.2 farver_2.1.0 ellipsis_0.3.2
## [29] withr_2.4.1 cli_3.0.0 magrittr_2.0.1 crayon_1.4.1
## [33] readxl_1.3.1 evaluate_0.14 fs_1.5.0 fansi_0.4.1
## [37] nlme_3.1-148 xml2_1.3.2 foreign_0.8-80 tools_4.0.2
## [41] data.table_1.14.0 hms_1.1.0 lifecycle_1.0.0 munsell_0.5.0
## [45] reprex_2.0.0 zip_2.2.0 compiler_4.0.2 rlang_0.4.10
## [49] grid_4.0.2 rstudioapi_0.13 labeling_0.4.2 rmarkdown_2.9
## [53] gtable_0.3.0 abind_1.4-5 DBI_1.1.1 curl_4.3
## [57] R6_2.4.1 lubridate_1.7.10 knitr_1.33 utf8_1.1.4
## [61] stringi_1.5.3 Rcpp_1.0.7 vctrs_0.3.8 dbplyr_2.1.1
## [65] tidyselect_1.1.0 xfun_0.24