Chapter 3 Least Squares Kernel Estimators

We start with an introduction to kernel estimators as a building block of nonparametric technique and continue with a discussion of assumptions typically made. We refer to this set of assumptions and their extensions for the other nonparametric estimators as we will review in the subsequent sections.

Assume that we have a collection of independently and identically distributed (i.i.d.) observations \({(Yi, Xi)}^n_i=1\) realized from a joint probability density function (p.d.f.) \(f(y, x)\). We can write unknown regression relationship between Y and X as

\[\begin{equation} Y_i =g(X_i)+ε_i, i=1,...,n, \end{equation}\]

where g(·) is an unspecified function and \(ε_i\)’s are observation errors. If we believe that g(·), true regression function, is smooth, i.e., a differentiable function up to some degree, then we can use a local average of data near a point x, rather than at a point x, to construct an estimator of g(·). Formally, we can write the estimator of g as

\[\begin{equation} \hat{g}(x)=\sum_{j=1}^{n}w_j(x,X_j;h)Y_j \end{equation}\]

which is called a linear smoother. Note that \(w_j\) is a weight assigned to each \(Y_j\) calculated for each \(X_j\) in the h−neighborhood of x, where h is called the smoothing parameter that determines the size of the local neighborhood.

If we assume \(\{w_j\}_{j=1}^{n}\) is a sequence of nonnegative weights and \(\sum_{j=1}^{n} w_j(x,X_j;h)=1\) for each x, then Equation 2.2 can be obtained from a minimization of locally weighted least squares problem:

\[\begin{equation} \min_{a}\sum_{j=1}^{n}(Y_j-a)^2K((x-X_j)/h), \end{equation}\]

where \(K(·)\) denotes a symmetric kernel weight function. The solution of this problem is

\[\begin{equation} \hat{a}\equiv\hat{g}(x)=\frac{\sum_{1=1}^{n}Y_iK((x-X_i)/h)}{\sum_{i=1}^{n}K((x-X_i)/h)} \end{equation},\]

which is known as the Nadaraya-Watson kernel estimator, \(K((x−Xi )/h)\) and \(w_i\) in (2.2) can be explicitly written as

\[\begin{equation} \frac{K((x-X_i)/h)}{\sum_{j=1}^{n}K((x-X_j)/h)}, \end{equation}\]

which is the weight attached to \(Y_i\) for each i = 1, …, n. Note that \(K(·)\) assigns specific weights to each \(X_j\) depending on its closeness to the point at which we estimate the unknown function.