Chapter 2 Introduction
Empirical research in economics typically aims to investigate the relationship between the mean of a response variable Y and an explanatory variable X in the following form:
\[\begin{equation} E[Y|X]=f(X) \end{equation}\]
where \(f(·)\) is a function that describes this link. If we assume that \[f(X) = α + Xβ,\] then it is called that the mean of Y is related to X linearly. The linear regression model is one of the most popular tools in data analysis that has a simple functional form allowing a researcher to interpret the coefficient estimates easily for policy-relevant questions. It also has a well-established least-squares theory so that a data analyst may conduct statistical inference based upon this linear specification. However, the fact that economic theory suggests a set of variables relevant in explaining a policy question, but rarely dictates a specific functional form of the relationship between those variables. An incorrectly specified parametric model leads to serious misspecification bias, which cannot be reduced only by large samples, and, thus, results in misleading inference (Scott, 1992, p.33).
Nonparametric regression methods are used to avoid this misspecification problem assuming not on the structure of the regression relationship, but rather on the characteristics of the structure. This modeling approach essentially let the data show the researcher appropriate functional form of the regression relationship. Stoker (1992) intuitively states that nonparametric econometric model specifies a connection between a process, which is attributed, by the model itself, to the economic agents’ “systematic” responses, also interpreted as “predictable” behaviors, and the observed data. The following definition makes a clear distinction between the two approaches.
Remark 1 A set of probability measure \(P_θ\) is a known probability measure with an unknown parameter \(θ\) belongs to finite dimensional parameter space, \(Θ\); i.e., \(θ ∈ Θ ⊆ R^d\), where d is the dimension.
Remark 2 If a set of probability measure \(P_θ\) is unknown with an unknown parameter \(θ\) belongs to infinite dimensional parameter space, then it constitutes a nonparametric family of models.
In the nonparametric method, the researcher chooses an appropriate function space to which regression function is believed to belong. Following the lines of explanation in Eubank (1988, p.3), this choice is motivated by regularity conditions including smoothness assumptions imposed on unknown regression function. Note that the “qualitative” restrictions on regression function enable a researcher to let data determine the form of a regression curve. On the other hand, this flexibility of the nonparametric method has a price to be paid regarding the loss of efficiency, higher dimensionality problem and slower convergence rates.
References
David W. Scott., (1992). Multivariate Density Estimation: Theory, Practice, and Visualization. John Wiley & Sons, Inc.
Randall L. Eubank, (1988). Spline Smoothing and Nonparametric Regression. Marcel Dekker, Inc.
Thomas M. Stoker, (1992). Lectures on semiparametric econometrics. Technical report, CORE Lecture Series.