Here, we briefly cover some properties of RKHS that are well-known and potentially useful.
Kernels are used to produce new (hopefully interesting) features on some existing feature space \(T\).
For concreteness, let’s think \(T=\mathbb{R}^p\).
The key ingredient is a covariance kernel \(K:T \times T \mathbb{R}\)
\(K(t,s) = K(s,t)\)
Non-negative definiteness: for any \((a_1,\dots,a_k) \in \mathbb{R}, (t_1, \dots, t_k) \in T\):
\[ \sum_{i,j=1}^k a_j a_j K(t_i, t_j) \geq 0 \]
\[ K_t(s) = K(t,s) \]
\[ \sum_{i=1}^k a_i K_{t_i} \]
\[ \langle K_t, K_s \rangle_{\cal H} = K(t,s) \]
\[ \sum_{i,j=1}^k a_j a_j K(t_i, t_j) = \langle \|\sum_{i=1}^k a_i K_{t_i} \rangle\|^2_{\cal H} \]
Call all finite linear combinations \({\cal H}_0\).
The RKHS \({\cal H}\) is the completion of \({\cal H}_0\) (i.e. equivalence classes of Cauchy sequences in \({\cal H}_0\)).
\[ \langle h, K_t \rangle = h(t) \]
Take \(T=\mathbb{R}^p\) and any \(\Sigma \geq 0 \in \mathbb{R}^{p \times p}\)
For \(v \in \text{row}(\Sigma)\) define
\[ K_v(x) = v'x \]
\[ u'\Sigma v, \qquad u, v \in \text{row}(\Sigma). \]
\[ {\cal H} = \left\{f:[0,1] \to \mathbb{R}: f(0)=0, \dot{f} \in L^2([0,1]) \right\} \] - I.e.
\[ \|f\|^2_{\cal H} = \|\dot{f}\|^2_{L^2([0,1])} \] - This is the completion of
\[ {\cal H}_0 = \left\{ \sum_i a_i K_{t_i} \right\} \]
with
\[ K_{t}(s) = \min(t, s), \qquad s,t \in [0,1] \]
\[ {\cal H} = \left\{f:[0,1] \to \mathbb{R}: f(0)=0, \dot{f}(0)=0, \ddot{f} \in L^2([0,1]) \right\} \]
\[ \int_{[0,1]} \ddot{f}(s)^2 \; ds \]
The feature maps are piecewise cubic functions…
What is the corresponding kernel?
To see what the corresponding kernel is, it is helpful to consider another object associated to \(K\)
For any covariance kernel \(K\), we can associate a Gaussian process \(Z:T \times \Omega \rightarrow \mathbb{R}\):
\[ \text{Cov}(Z_t, Z_s) = K(t,s) \]
Piecewise linear kernel the process is Brownian motion on \([0,1]\), let’s call it \(B\).
Piecewise cubic kernel
\[ Z_t = \int_{[0,t]} B_s \; ds \]
Let \({\cal T}=(t_1, \dots, t_n)\) denote any finite set in \(T\)
Associated to \({\cal T}\) are:
\[ \begin{aligned} L_{\cal T} &= \text{span}(K_{t_i}, t_i \in {\cal T}) \\ P_{\cal T}(h) &= \text{argmin}_{k \in L_{\cal T}} \|h - k\|^2_{\cal H} \\ \end{aligned} \]
\[ 0 = \langle h - P_{\cal T} h, K_{t_i} \rangle_{\cal H} = h(t_i) - (P_{\cal T} h)(t_i) \]
\[ \ell(h(t_1), \dots, h(t_n)) + \lambda \|h\|^2_{\cal H} \geq \ell((P_{\cal T}h)(t_1), \dots, (P_{\cal T}(h)(t_n)) + \lambda \|P_{\cal T}h\|^2_{\cal H} \]
\[ \text{argmin}_{h \in {\cal H}} {\ell}(h(t_1), \dots, h(t_n)) + \lambda \|h\|^2_{\cal H} \subset L_{\cal T} \]
\[ \text{argmin}_{h \in L_{\cal T}} {\ell}(h(t_1), \dots, h(t_n)) + \lambda \|h\|^2_{\cal H} \]
\[ h = \sum_i \alpha_i K_{t_i} \]
\[ {\cal H} \ni h(\cdot) \overset{\bar{K}}{\to} \int_T K(t, \cdot) h(t) \; \Lambda_T(dt) \]
\[ \bar{K}e_i = \lambda_i e_i \]
\[ K(t,s) = \sum_i \lambda_i e_i(t) e_i(s) \]
\[ {\cal H} \ni h(\cdot) \overset{\bar{K}_F}{\to} \int_T K(t, \cdot) f(t) h(t) \; dt \]
\[ K(t,s) = \sum_i \lambda_{i,F} e_{i,F}(t) e_{i,F}(s) \]
Computationally, one may want to use a low-rank approximation to \(K\)…
We’d expect Lebesgue expansion \(L^2(\Lambda_T \times \Lambda_T)\) while the above would be preferable for approximation in \(L^2(F \times F)\)
Let \(\xi_i(\omega) \overset{IID}{\sim} N(0,1)\)
Given a basis \(e_{i,F}\) from such an expansion (or any other ONB) set
\[ Z_t(\omega) = \sum_i \lambda_{i,F}^{1/2} \xi_i(\omega) e_{i,F}(t) \]
\[ Z(h,\omega) = \sum_i \lambda_{i,F}^{1/2} \xi_i(\omega) \langle e_{i,F}, h \rangle_{\cal H} \]
\[ \begin{aligned} K(s,t) &= \langle K_s, K_t \rangle_{\cal H} \\ &= \sum_{i} \langle K_s, \varphi_i \rangle_{\cal H} \cdot \langle \varphi_i, K_t \rangle_{\cal H} \\ &= \sum_{i} \varphi_i(t) \varphi_i(s). \end{aligned} \] we can realize the law as
\[ Z_t(\omega) = \sum_{i} \xi_i(\omega) \varphi_i(t) \]
\[ K(s,t) = \sum_{i \in {\cal I}} \varphi_i(t) \varphi_i(s) \] Then, \((\varphi_i)_{i \in {\cal I} }\) is an ONB for \({\cal H}\) determined by \(K\). The corresponding process is
\[ Z_t(\omega) = \sum_{i \in {\cal I}} \xi_i(\omega) \varphi_i(t) \]
\[ \hat{Z}(\cdot) = \int_T Z_t \cdot g(t) \; \Lambda_T(dt) \cdot g(\cdot) \]
\[ \mathbb{E}\left[\int_T (Z_t - \hat{Z}_t)^2 \Lambda_T(dt)\right] \]
\[ \text{maximize}_{\{g:\|g\|^2_{L^2(\Lambda_T)}=1\}} \int_T \int_T K(s,t) g(s) g(t) \; \Lambda_T(ds) \; \Lambda_T(ds) \]
Solution is to take top Karhunen-Loeve eigenfunction \(e_{1,\Lambda_T}\)
Generalizes to projection onto a subspace of dimension \(k\).
Swapping \(\Lambda_T\) to \(F\) yields \(e_{1,F}\)

\[ \text{maximize}_{h: \|h\|_{\cal H} \leq 1} \hat{\text{Var}}(h(X)) \]
\[ \left(I - n^{-1}11'\right) G \left(I - n^{-1}11'\right) \]
with \(G_{ij} = K(X_i, X_j)\).
\[ Z_t - \int_T Z_s f(s) \; ds \]