Accompanying Textbook:
Accompanying Textbook:
Functional Data
From Raw to Functional Data
The simplest data set encountered in functional data analysis is a sample of the form: \[ \begin{align} x_i(t_j),\quad t_j\in[a,b],\quad i=1,\dots,n \quad j=1\dots,J. \end{align} \] The theoretical objects we wish to study are smooth curves: \[ \begin{align} X_i(t),\quad t\in[a,b],\quad i=1,\dots,n \quad j=1\dots,J. \end{align} \]
Basis expansions:
Typically, the first step in working with functional data is to express them by means of a basis expansion \[ X_i(t)\approx\sum_{m=1}^M c_{im} B_m(t),\quad t\in[a,b], \] where \(B_1(t),\dots,B_M(t)\) are some standard collection of basis functions like:
B-spline basis functions \(B_1(t),\dots,B_M(t)\):
bspl_bf <- create.bspline.basis(rangeval=c(0,1), norder=3, breaks=seq(0,1,len=7)) plot(bspl_bf, main="B-spline Basis Functions", xlab="[a,b]=[0,1]")
Fourier basis functions \(B_1(t),\dots,B_M(t)\):
fourier_bf <- create.fourier.basis(rangeval=c(0,1), nbasis=5) plot(fourier_bf, main="Fourier Basis Functions", xlab="[a,b]=[0,1]")
Example: Berkeley growth study raw data
matplot(x=growth$age, y=growth$hgtf[,1:5], type="p", lty=1, pch=21, xlab="age", ylab="height (cm)", cex.lab=1.2,col=1:5,bg=1:5, main="5 Girls in Berkeley Growth Study")
Example: Berkeley growth study pre-processed data
SmObj <- smooth.basisPar(growth$age,y=growth$hgtf[,1:5],lam=0.1) plot(SmObj$fd, xlab="age", ylab="height (cm)", cex.lab=1.2, main="5 Girls in Berkeley Growth Study", lty=1)
## [1] "done"
Example: Berkeley growth study 1st derivative
plot(deriv(SmObj$fd, 1), xlab="age", ylab="growth rate (cm / year)", cex.lab=1.2, main="5 Girls in Berkeley Growth Study (1st derivative)", lty=1)
## [1] "done"
Example: Berkeley growth study 2nd derivative
plot(deriv(SmObj$fd, 2), xlab="age", cex.lab=1.2, ylab="growth acceleration (cm / year^2)", main="5 Girls in Berkeley Growth Study (2nd derivative)", lty=1)
## [1] "done"
Sample Mean and Covariance
Pointwise mean: \[ \bar{X}_n(t)=\frac{1}{n}\sum_{i=1}^n X_i(t) \] Pointwise standard deviation: \[ S_n(t)=\sqrt{\frac{1}{n-1}\sum_{i=1}^n\Big(X_i(t)-\bar{X}_n(t)\Big)^2} \]
Pointwise mean and standard deviation:
BM.Mean <- rowMeans(BM.mat) BM.SD <- apply(BM.mat, 1, sd)
Pointwise covariance function: \[ \hat{c}_n(t,s)=\frac{1}{n-1}\sum_{i=1}^n\Big(X_i(t)-\bar{X}(t)\Big)\Big(X_i(s)-\bar{X}(s)\Big) \]
Pointwise covariance function:
BM.Cov <- var(BM.mat)
Principal Component Functions
Idea: Use the Estimated Functional Principal Components (EFPC's) \(\hat{v}_j\) as basis functions for \(X_i\): \[ X_i(t)\approx\bar{X}_n(t) + \sum_{j=1}^p\hat{\xi}_{ij}\hat{v}_j(t) \]
Best basis property
Eigendecomposition of \(\hat{c}_n(t,s)\): \[ \begin{align} \hat{c}_n(t,s)&\approx\sum_{j=1}^p\hat{\lambda}_j\hat{v}_j(t)\hat{v}_j(s), \end{align} \] where \(\hat{\lambda}_1>\dots >\hat{\lambda}_p\) denote the estimated eigenvalues.
\[ \begin{align} X_i(t)&\approx\bar{X}_n(t) + \sum_{j=1}^p\hat{\xi}_{ij}\hat{v}_j(t) \end{align} \]
Case Study:
BOA Stock Returns
We study the daily cumulative log-return functions: \[R_i(t):=\log(P_i(t))-\log(P_i(0))\approx\frac{P_i(t)-P_i(0)}{P_i(0)}\]
# Plot functional data plot(log_BOA_fd, xlab="Trading Hours",ylab="",lwd=1, col=gray(.5), main="Cumulative Log-Return Functions") lines(log_BOA_fd[1:10],lwd=1.5, lty=1)
Mathematical Framework:
1. Square Integrable Functions
What makes the space \(L^2([a,b])\) so convenient is its structure:
Two functions \(f\) and \(g\) are orthogonal if \[\langle f,g\rangle=0\]
The norm \(||f||=\sqrt{\langle f,f\rangle}\), gives us a notion for the distance between two functions: \[d(f,g)=||f-g||\]
As we have already seen, basis expansions play an important role in methodology for functional data.
Mathematical Framework:
2. Random Functions
Let \(X\) denote a random function defined on a probability space, say \(\Omega\).
We assume that all
realizations \(X(\omega)\), \(\omega\in\Omega\), are elements of the space \(L^2([a,b])\) of square integrable functions, i.e
\(||X(\omega)||=\sqrt{\int_a^b \big(X(\omega)(t)\big)^2dt} < \infty\) for all \(\omega\in\Omega\).
\(||X||\in\mathbb{R}\) is thus a real random variable.
If \(E(||X||^2)<\infty\), we say that the random function \(X\) is square integrable. Caution: Integration is here with respect to \(\omega\), not \(t\). It might be more pedagogical to say that the random function \(X\) has a finite second moment.
The population covariance function leads to Functional Principal Component Analysis (FPCA), which allows us to represent a square integrable random function \(X\) as: \[ \textstyle X(t)=\mu(t)+\sum_{j=1}^\infty\xi_j v_j(t) \]
The eigenfunctions \(v_j\) are the solutions of the eigen-equation \(\int_a^bc(t,s)v_j(s)ds=\lambda_jv_j(t)\).
\(\lambda_1\geq\lambda_2\geq\dots\) denote the eigenvalues.
The random variables \(\xi_j\) are called the scores \[ \textstyle \xi_j=\langle X-\mu,v_j\rangle=\int_a^b(X(t)-\mu(t))v_j(t)dt \]
FPCA allows us to represent a square integrable random function \(X\) as ( Karhunen-Loéve expansion): \[ \textstyle X(t)=\mu(t)+\sum_{j=1}^\infty\xi_j v_j(t) \]
It can be shown that: \[ E(\xi_j)=0,\quad V(\xi_j)=\lambda_j,\quad \operatorname{Cov}(\xi_j,\xi_k)=0,\,j\neq k, \] and \(E\int_a^b(X(t)-\mu(t))^2dt=E||X-\mu||^2=\sum_{j=1}^\infty\lambda_j\).
That is, \(\lambda_j\) is the variance of the random function \(X\) in the principal direction \(v_j\) and the sum of these variances is the total variance of \(X\).
Functional Regression Models
Scalar–on–function regression: \[Y_i=\int_a^b\beta(s)X_i(s)ds+\varepsilon_i\] Function–on-scalar regression: \[Y_i(t)=\sum_{k=1}^p\beta_k(t)x_{ik}+\varepsilon_i(t)\] Function–on-function regression: \[Y_i(t)=\int_a^b\beta(t,s)X_{i}(s)ds+\varepsilon_i(t)\]
Nonparametric scalar–on–function regression: \[Y_i=m(X_i)+\varepsilon_i,\] where \(Y_i\in\mathbb{R}\) and \(X_i\in L^2([a,b])\).
Nonparametric function–on–function regression: \[Y_i=m(X_i)+\varepsilon_i,\] where \(Y_i,\,X_i\in L^2([a,b])\).
The above models are just prototypes illustrating the general idea. The main point is that the functional regression models are infinite dimensional objects which must be estimated from a finite sample.
Difficulties in Functional Regression
Remember: The usual regression model \[
\textstyle Y_i=X\boldsymbol{\beta}+\varepsilon
\] Normal equations and solutions (population & sample): \[
\textstyle \mathbf{C}_{XY}=\mathbf{C}_{XX}\boldsymbol{\beta}\quad\Rightarrow\quad \boldsymbol{\beta}=\mathbf{C}_{XX}^{-1}\mathbf{C}_{XY}
\] \[
\textstyle \widehat{\mathbf{C}}_{XY}=\widehat{\mathbf{C}}_{XX}\widehat{\boldsymbol{\beta}}_n\quad\Rightarrow\quad
\widehat{\boldsymbol{\beta}}_n=\widehat{\mathbf{C}}_{XX}^{-1}\widehat{\mathbf{C}}_{XY}
\] where
\([\mathbf{C}_{XX}]_{r,\ell}=E(X_rX_\ell)\),
\([\mathbf{C}_{XY}]_r=E(X_rY)\),
\(\big[\widehat{\mathbf{C}}_{XX}\big]_{r,\ell}=\frac{1}{n}\sum_{i=1}^n X_{ir}X_{i\ell}\), and
\(\big[\widehat{\mathbf{C}}_{XY}\big]_r=\frac{1}{n}\sum_{i=1}^n X_{ir}Y_{i}\).
Normal equations and solutions (population & sample): \[ \begin{align} \textstyle \mathbf{C}_{XY}=\mathbf{C}_{XX}\boldsymbol{\beta}\quad&\Rightarrow\quad \boldsymbol{\beta}=\mathbf{C}_{XX}^{-1}\mathbf{C}_{XY}\\ \textstyle \widehat{\mathbf{C}}_{XY}=\widehat{\mathbf{C}}_{XX}\widehat{\boldsymbol{\beta}}_n\quad&\Rightarrow\quad \widehat{\boldsymbol{\beta}}_n=\widehat{\mathbf{C}}_{XX}^{-1}\widehat{\mathbf{C}}_{XY} \end{align} \]
To obtain \(\boldsymbol{\beta}\) and \(\widehat{\boldsymbol{\beta}}_n\), we need invertibility of the \(p\times p\) matrices \(\mathbf{C}_{XX}\) and \(\widehat{\mathbf{C}}_{XX}\).
Functional regression \[
\textstyle Y_i=\int_a^b\beta(t)X_i(t)+\varepsilon_i
\] Normal equations (population & sample): \[
\textstyle c_{XY}(t)=\int_a^b c_{XX}(t,s)\beta(s)ds
\] \[
\textstyle \hat{c}_{XY}(t)=\int_a^b \hat{c}_{XX}(t,s)\beta(s)ds
\] where
\(c_{XX}(t,s)=E(X(t)X(s))\), \(c_{XY}(t)=E(X(t)Y)\),
\(\hat{c}_{XX}(t,s)=\frac{1}{n}\sum_{i=1}^n(X_i(t)X_i(s))\), and
\(\hat{c}_{XY}(t)=\frac{1}{n}\sum_{i=1}^n(X_i(t)Y_i)\).
Functional regression \[ \textstyle Y_i=\int_a^b\beta(t)X_i(t)+\varepsilon_i \] Normal equations (sample): \[ \textstyle \hat{c}_{XY}(t)=\int_a^b \hat{c}_{XX}(t,s)\beta(s)ds \]
Problem:
Classical solutions:
Estimation Through Basis Expansion
Let us consider the scalar-on-function model \[ \textstyle Y_i=\alpha + \int_a^b \beta(s)X_i(s)ds+\varepsilon_i,\quad i=1,\dots,n \]
Simplest approach: Expand \(\beta\) using deterministic basis functions (splines, fourier, etc) \[ \textstyle \beta(t)=\sum_{k=1}^Kc_kB_k(t) \]
Rewriting: \[ \begin{align} \textstyle\int_a^b\beta(s)X_i(s)ds &\textstyle =\sum_{k=1}^Kc_k\underbrace{\int_a^bB_k(s)X_i(s)ds}_{=x_{ik}} %&\textstyle =\sum_{k=1}^Kc_k \end{align} \]
\[ \begin{align} \textstyle Y_i &\textstyle = \alpha + \int_a^b \beta(s)X_i(s)ds +\varepsilon_i\\ \textstyle Y_i &\textstyle = \alpha + \sum_{k=1}^Kc_k x_{ik} +\varepsilon_i\\ \textstyle \mathbf{Y} &\textstyle = \mathbf{X}\mathbf{c}+\varepsilon_i,\quad\quad \end{align} \]
The estimator of the parameter vector \(\mathbf{c}=(\alpha,c_1,\dots,c_{K})'\) \[ \textstyle \hat{\mathbf{c}}=(\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{Y}, \] leads to an estimator of the parameter function \(\beta(t)\) \[ \textstyle \widehat{\beta}(t)=\sum_{k=1}^K\hat{c}_kB_k(t). \]
Biased estimation due to the truncation error: \(\delta_K(t)\) \[ \begin{align} \textstyle\beta(t) &\textstyle =\sum_{k=1}^\infty c_k B_k(t)\\ &\textstyle =\sum_{k=1}^K c_k B_k(t) + \underbrace{\sum_{\ell=K+1}^\infty c_\ell B_\ell(t)}_{=\delta_K(t)} \end{align} \]
It can be shown that (see pp. 55-56 of the accompanying textbook): \[ \textstyle\hat{\mathbf{c}}=\mathbf{c}+(\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\boldsymbol{\delta}_K+(\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\boldsymbol{\varepsilon}, \] where \([\boldsymbol{\delta}_K]_i=\int_a^b\delta_K(s)X_i(s)ds\).
Estimation with a Roughness Penalty
\[ \begin{align} \textstyle P_{\lambda}(\alpha,\beta)= &\textstyle \sum_{i=1}^n\Big(Y_i-\alpha-\int_a^b\beta(s)X_i(s)ds\Big)^2\\ &\textstyle +\lambda\int_a^b\big[(L\beta)(s)\big]^2ds \end{align} \]
Idea:
It can be shown that (see pp. 56-57 of the accompanying textbook): \[ \textstyle \mathbf{c}_{\lambda}=(\mathbf{X}'\mathbf{X}+\lambda \mathbf{R})^{-1}\mathbf{X}'\mathbf{Y}. \]
Continued:
\[ \textstyle \mathbf{c}_{\lambda,K}=(\mathbf{X}'\mathbf{X}+\lambda \mathbf{R})^{-1}\mathbf{X}'\mathbf{Y} \] where \[ \mathbf{R}= \left(\begin{matrix} 0 &0 &\dots &0 \\ 0 &R_{11}&\dots &R_{1K}\\ 0 &R_{21}&\dots &R_{2K}\\ \vdots &\vdots&\vdots&\vdots\\ 0 &R_{K1}&\dots &R_{KK}\\ \end{matrix}\right) \] with \(R_{rk}=\int_a^b(LB_r)(s)(LB_k)(s)ds\).
Regression on FPCs
\[ \begin{align} \textstyle Y_i &\textstyle = \alpha + \int_a^b \beta(s)X_i(s)ds +\varepsilon_i\\ \textstyle Y_i &\textstyle = \alpha + \int_a^b \beta(s)\big(\hat{\mu}(s)+\sum_{j=1}^p\hat{\xi}_{ij}\hat{v}_j(s)\big)ds +\varepsilon_i\\ \textstyle Y_i &\textstyle =\beta_0+\sum_{j=1}^p\beta_j\hat{\xi}_{ij} +\varepsilon_i, \end{align} \] where \[\textstyle\beta_0=\alpha+\int_a^b\beta(s)\hat{\mu}(s)ds\] and \[\textstyle\beta_j=\int_a^b\beta(s)\hat{v}_j(s)ds\] are treated as unknown parameters.
The parameter vector \(\mathbf{\beta}=(\beta_0,\beta_1,\dots,\beta_p)'\) can be estimated by (see page 59 of the accompanying textbook) \[ \textstyle\mathbf{\beta}=(\mathbb{X}'\mathbb{X})^{-1}\mathbb{X}'Y \]
where \[ \mathbb{X}= \left(\begin{matrix} 1 &\hat{\xi}_{11}&\dots &\hat{\xi}_{1K}\\ \vdots&\vdots &\vdots&\vdots \\ 1 &\hat{\xi}_{K1}&\dots &\hat{\xi}_{KK}\\ \end{matrix}\right) \]
The parameter vector \(\mathbf{\beta}=(\beta_0,\beta_1,\dots,\beta_p)'\) can be estimated by (see page 59 of the accompanying textbook) \[ \textstyle\mathbf{\beta}=(\mathbb{X}'\mathbb{X})^{-1}\mathbb{X}'Y \]
The estimators \(\hat\beta_0,\hat\beta_1,\dots,\hat\beta_p\) lead to the estimators \[\textstyle \hat{\beta}(t)=\sum_{j=1}^p\beta_j\hat{v}_j(t)\] and \[\textstyle \hat{\alpha} =\hat\beta_0-\sum_{j=1}^p\beta_j\int_a^b\hat{v}_j(s)\hat\mu(s)ds\] of our actual interest.
Functional Regression:
Hands On
set.seed(9000) n <- 1000 grid <- seq(0, 1, length = 101) ## beta <- sin(grid * 2 * pi) X <- matrix(0, nrow=n, ncol=length(grid)) for(i2 in 1:n){ X[i2,] <- X[i2,]+rnorm(length(grid), 0, 1) X[i2,] <- X[i2,]+runif(1, 0, 5) X[i2,] <- X[i2,]+rnorm(1, 1, 0.2)*grid for(j2 in 1:10){ e <- rnorm(2, 0, 1/j2^(2)) X[i2,] <- X[i2,]+e[1]*sin((2*pi)*grid*j2) X[i2,] <- X[i2,]+e[2]*cos((2*pi)*grid*j2) }} ## Note the Integral-Approximation using '* .01': Y = X %*% beta * .01 + rnorm(n, 0, .4)
matplot(x = grid, y=t(X), type = "l", lty=1)
Estimating with the roughness penalty approach and the FPCA-appoach:
library("refund") ## Roughness Penalty fit.RP <- pfr(Y ~ lf(X, bs = "ps", k = 50)) ## FPCA fit.FPCA <- pfr(Y ~ fpc(X, pve = 0.99))
A more complex \(\beta(t)\) function:
## beta <- -1 * dnorm(grid, mean=.20, sd=.03) + 3 * dnorm(grid, mean=.50, sd=.04) + 1 * dnorm(grid, mean=.75, sd=.05) ## Note the Integral-Approximation using '* .01': Y = X %*% beta * .01 + rnorm(n, 0, .4)
Nonparametric Functional Regression
Nonparametric scalar–on–function regression: \[Y_i=m(X_i)+\varepsilon_i,\] where \(Y_i\in\mathbb{R}\) and \(X_i\in L^2([a,b])\).
Functional version of the Nadaraya-Watson estimator:
\[ \widehat{m}(X)=\frac{\sum_{i=1}^n K(h^{-1}d(X,X_i))Y_i}{\sum_{j=1}^nK(h^{-1}d(X,X_j))}, \] where \(d\) is a semi-metric such as, for instance, the FPCA-based semi-metric: \[ \textstyle d(X_i,X_j)=\sqrt{\sum_{k=1}^K(\hat{\xi}_{ik}-\hat{\xi}_{jk})^2} \]
suppressMessages(library("fda.usc")); data(tecator) absorp <- tecator$absorp.fdata x <- absorp[1:129,] y <- tecator$y$Fat[1:129]
res.pca1 <- fregre.np(x,y,Ker=AKer.tri,metri=semimetric.pca,q=1) summary.fregre.fd(res.pca1, draw = FALSE)
## *** Summary Functional Non-linear Model *** ## ## -Call: fregre.np(fdataobj = x, y = y, Ker = AKer.tri, metric = semimetric.pca, ## q = 1) ## ## -Bandwidth (h): 2.88226 ## -R squared: 0.3282283 ## -Residual variance: 115.5629 on 119.2519 degrees of freedom ## -Names of possible atypical curves: No atypical curves ## -Names of possible influence curves: 35 43 44 99 125