R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

summary(cars)
##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00

Including Plots

You can also embed plots, for example:

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

1.1 Welcome to the Course!

1.2 Types of Data Analysis

Types of Data Analysis Question being asked
Descriptive data analysis(기술통계분석) seek to summarize(요약) the measurement in a single data set without further interpretation(e.g., US Census)
Exploratory data analysis(탐색적자료분석) search for discoveries(발견), trends, correlations, or relationships between the measurements to generate ideas or hypotheses(e.g., The four-star planetary system Tatooine)
Inferential data analysis(추론분석) quantify whether an observed pattern will likely hold beyond the data set in hand or in population(모집단)(e.g., a study of whether air pollution correlates with life expectancy in US)
Predictive data analysis(예측분석) predict(예측) another measurement (the outcome) on a single person or unit(e.g., prediction of how people will vote in an election)
Causal data analysis(인과관계분석) seek to find out what happens to one measurement on average if you make another measurement change(인과, e.g., causal relationship between smoking and cancer)
Mechanistic data analysis(결정론적관계분석) seek to show that changing one measurement always and exclusively leads to a specific, deterministic behavior in another(인과, e.g., wing design)

leek&peng 2015

Real Question Type Perceived Question Type Phrase Describing Error
Inferential Causal Correlation does not imply causation
Exploratory Inferentail Data Dredging(or p-hacking)
Exploratory Predictive Overfitting(과적합)
Descriptive Inferential n of 1 analysis

“In nonrandomized experiments, it is usually only possible to determine the existence of a relationship between two measurements, but not the underlying mechanism or the reason for it.”

“Historically, social scientists have sought out explanations of human and social phenomena that provide interpretable causal mechanisms, while often ignoring their predictive accuracy. We argue that the increasingly computational nature of social science is beginning to reverse this traditional bias against prediction.” (Hofman et al., 2017)